Introducing NLP Flow

Today we are introducing NLP Flow, a collection of processors for the popular Apache NiFi data platform to support NLP pipeline data flows.

Apache NiFi is a cross-platform tool for creating and managing data flows. With Apache NiFi you can create flows to ingest data from a multitude of sources, perform transformations and logic on the data, and interface with external systems. Apache NiFi is a stable and proven platform used by companies worldwide.

Extending Apache NiFi to support NLP pipelines is a perfect fit. NLP Flow is, in Apache NiFi terminology, a set of processors that facilitate NLP tasks via our NLP Building Blocks. With NLP Flow, you can create powerful NLP pipelines inside of Apache NiFi to perform language identification, sentence extraction, text tokenization, and named-entity extraction. For example, an NLP pipeline to ingest text from HDFS, extract all named-person entities for English and Spanish text, and persist the entities to a MongoDB database can be managed and executed within Apache NiFi.

NLP Flow is free for everyone to use. An existing Apache NiFi (a free download) installation is required.

 NLP Flow

 

Simplified Named-Entity Extraction Pipeline in Idyl NLP

Idyl NLP 1.1.0 introduces a simplified named-entity extraction pipeline that can be created in just a few lines of code. The following code block shows how to make a pipeline to extract named-person entities from natural language English text in Idyl NLP.

NerPipelineBuilder builder = new NerPipeline.NerPipelineBuilder();
NerPipeline pipeline = builder.build(LanguageCode.en);

EntityExtractionResponse response = pipeline.run("George Washington was president.");
		
for(Entity entity : response.getEntities()) {
  System.out.println(entity.toString());
}

When you run this code a single line will be printed to the screen:

Text: George Washington; Confidence: 0.96; Type: person; Language Code: eng; Span: [0..2);

Internally, the pipeline creates a sentence detector, tokenizer, and named-entity recognizer for the given language. Currently only person-entities for English is supported but we will be adding support for more languages and more entity types in the future. The goal of this functionality is to simplify the amount of code needed to perform a complex operation like named-entity extraction. The NerPipeline class is new in Idyl NLP 1.1.0-SNAPSHOT.

Idyl NLP is our open-source, Apache-licensed NLP framework for Java. Its releases are available in Maven Central and daily snapshots are also available. See Idyl NLP on GitHub at https://github.com/idylnlp/idylnlp for the code, examples, and documentation. Idyl NLP powers our NLP Building Blocks.