Apache NiFi 1.7.0 was recently announced. With that announcement we want to provide a guide on using NLP Flow with Apache NiFi 1.7.0.
Use the link above to download Apache NiFi 1.7.0. Once downloaded, extract the archive somewhere to your disk. We’ll assume for purposes of going forward you extracted it to /opt/nifi-1.7.0.
Use the link above to download NLP Flow. Once downloaded, extract the files to Apache NiFi’s lib directory at /opt/nifi-1.7.0/lib. We can now start Apache NiFi:
Apache NiFi will now start running in the background.
NLP Building Blocks
We will use the Docker containers for the NLP Building Blocks. To do so, you must have git, docker, and docker-compose installed.
git clone https://github.com/mtnfog/nlp-building-blocks.git cd nlp-building-blocks docker-compose up
This will start the NLP Building Blocks.
Building a Flow
We can now open a browser to http://localhost:8080/nifi and see Apache NiFi’s canvas. NLP Flow’s processors for interacting with the NLP Building Blocks are available alongside the standard Apache NiFi processors. We can see the NLP Flow processors by filtering on the keyword “nlp” in the search:
The flow shown below performs named-entity extraction using the NLP Building Blocks containers that we started earlier. Files are read from the file system, separated into sentences, sentences are tokenized, entities are extracted from the tokens, and then the entities are stored in a MongoDB database. I did not use Renku Language Detection Engine in this flow because it was known beforehand that all input files would be in English. Otherwise, a Renku Language Engine Processor along with a RouteOnAttribute processor would have been used to appropriately route that text through the flow.