Sonnet Tokenization Engine

Sonnet Tokenization Engine is free for tokenization of English text. Get Sonnet.

Given natural language text as input, Sonnet analyzes the text and identifies the individual tokens in the input text.

Sonnet is one of our NLP Building Blocks.

Nearly all NLP processing systems require tokenized text as input. Entity extraction, text classification, and sentence detection are examples of NLP actions that require tokenized text. Sonnet Tokenization Engine provides a light-weight and scalable means of tokenization.

Languages

Sonnet supports the languages listed below.

LanguageISO 639-3 Code
Arabicara
Belarusianbel
Bulgarianbul
Catalancat
Czechces
Danishdan
Germandeu
Modern Greekell
Englisheng
Estonianest
Finnishfin
Frenchfra
Irishgle
Hebrewheb
Hindihin
Croatianhrv
Hungarianhun
Indonesianind
Icelandicisl
Italianita
Japanesejpn
Koreankor
Latvianlav
Lithuanianlit
Macedonianmkd
Maltesemlt
Malaymsa
Dutchnld
Norwegiannor
Polishpol
Portuguesepor
Romanianron
Russianrus
Slovakslk
Sloveneslv
Spanishspa
Albaniansqi
Serbiansrp
Swedishswe
Thaitha
Turkishtur
Ukrainianukr
Vietnamesevie
Chinesezho

REST API

Sonnet’s REST API accepts input text and returns the individual tokens of the text. The API can optionally accept a language parameter that indicates the language of the input text.

Get Sonnet Tokenization Engine

Get Sonnet.