Sonnet Tokenization Engine Quick Start

This is a quick start to get you up and running with Sonnet Tokenization Engine fast and painless. To do this, we are going to launch Sonnet as a docker container. If you launched Sonnet from the AWS Marketplace you can skip installing Docker and jump down to the example commands.

First install Docker if you haven’t already. Once done, run the following commands:

docker run -p 9040:9040 -it mtnfog/sonnet:1.0.0

Once Sonnet has started, open a console and run the following command:

curl http://localhost:9040/api/tokenize -d "Tokenize this text please." -H "Content-Type: text/plain"

This sends a request to Sonnet to tokenize the given English text. If you launched Sonnet from the AWS Marketplace you will need to substitute localhost with the public IP address of the EC2 instance.  The response from Sonnet will look like:

["Tokenize","this","text","please"]

That’s it! The response contains the individual tokens of the text. If your text is not English you can pass the language of the text in the request. The language should be the three-letter ISO 639-2 language code.

curl "http://localhost:9040/api/tokenize?language=eng" -d "Tokenize this text please." -H "Content-Type: text/plain"

The NLP Building Blocks Java SDK can be used to create NLP pipelines and to integrate Sonnet in your existing NLP pipeline.