NLP Building Blocks with Apache NiFi 1.7.0

Update: Launch NLP Flow in Amazon Web Services!

Apache NiFi 1.7.0 was recently announced. With that announcement we want to provide a guide on using NLP Flow with Apache NiFi 1.7.0.

Apache NiFi

Download Apache NiFi 1.7.0

Use the link above to download Apache NiFi 1.7.0. Once downloaded, extract the archive somewhere to your disk. We’ll assume for purposes of going forward you extracted it to /opt/nifi-1.7.0.

NLP Flow

Download NLP Flow

Use the link above to download NLP Flow. Once downloaded, extract the files to Apache NiFi’s lib directory at /opt/nifi-1.7.0/lib. We can now start Apache NiFi:

/opt/nifi-1.7.0/bin/nifi.sh start

Apache NiFi will now start running in the background.

NLP Building Blocks

We will use the Docker containers for the NLP Building Blocks. To do so, you must have git, docker, and docker-compose installed.

git clone https://github.com/mtnfog/nlp-building-blocks.git
cd nlp-building-blocks
docker-compose up

This will start the NLP Building Blocks.

Building a Flow

We can now open a browser to http://localhost:8080/nifi and see Apache NiFi’s canvas. NLP Flow’s processors for interacting with the NLP Building Blocks are available alongside the standard Apache NiFi processors. We can see the NLP Flow processors by filtering on the keyword “nlp” in the search:

The flow shown below performs named-entity extraction using the NLP Building Blocks containers that we started earlier. Files are read from the file system, separated into sentences, sentences are tokenized, entities are extracted from the tokens, and then the entities are stored in a MongoDB database. I did not use Renku Language Detection Engine in this flow because it was known beforehand that all input files would be in English. Otherwise, a Renku Language Engine Processor along with a RouteOnAttribute processor would have been used to appropriately route that text through the flow.

Leave a Reply