Sonnet Tokenization Engine Quick Start

This is a quick start to get you up and running with Sonnet Tokenization Engine fast and painless.

Getting Sonnet Tokenization Engine

Sonnet is available on the AWS Marketplace, Azure Marketplace, and as a docker image.


First install Docker if you haven’t already. Once done, run the following commands to start Renku.

docker run -p 9040:9040 -it mtnfog/sonnet:1.1.0

Interacting with Sonnet Tokenization Engine

Once Sonnet has started, open a console and run the following command:

curl http://localhost:9040/api/tokenize -d "Tokenize this text please." -H "Content-Type: text/plain"

This sends a request to Sonnet to tokenize the given English text. If you launched Sonnet from the AWS or Azure marketplaces you will need to substitute localhost with the public IP address of the virtual machine.  The response from Sonnet will look like:


That’s it! The response contains the individual tokens of the text. If your text is not English you can pass the language of the text in the request. The language should be the three-letter ISO 639-2 language code.

curl "http://localhost:9040/api/tokenize?language=eng" -d "Tokenize this text please." -H "Content-Type: text/plain"

The NLP Building Blocks Java SDK can be used to create NLP pipelines and to integrate Sonnet in your existing NLP pipeline.