Sonnet Tokenization Engine FAQ
What is Sonnet Tokenization Engine?
Many NLP systems operate on the individual tokens (typically words) of text instead of on the text as a whole. Sonnet Tokenization Engine breaks up input text into its individual tokens. For best results, the input text should be a single sentence.
What is “tokenization” and what are “tokens”?
Tokenization is the process of breaking text into tokens. For example, given the sentence “The dog is black.”, the result of tokenization would be the individual tokens of the sentence: [“The”, “dog”, “is”, “black”]
What languages does Sonnet Tokenization Engine support?
Sonnet supports the languages listed below.
|Language||ISO 639-3 Code|
How do I use Sonnet Tokenization Engine?
Sonnet has a REST API. Simply submit your text to Sonnet via its API. For an example see the Quick Start.
How much does Sonnet Tokenization Engine cost?
Sonnet is free when used to tokenize English text. A license is required to tokenize other languages.
How do I get Sonnet Tokenization Engine?
Sonnet is available for download, on the AWS Marketplace, and on DockerHub.