Idyl E3 with Apache NiFi

This page illustrates how to use Idyl E3 Entity Extraction Engine in big-data environments with Apache NiFi to perform entity extraction.

Apache NiFi

Apache NiFi is a popular application that makes complex data processing and migration simple through the creation of dataflow pipelines. We can utilize Idyl E3 to perform entity extraction inside of a NiFi pipeline. For example, a NiFi pipeline can be created that processes files from the file system, sends the content to Idyl E3 for entity extraction, and then persists the extracted entities to a database such as MongoDB or a search application like Elasticsearch.

The components of NiFi that perform the work are called processors. While you can accomplish the workflow described above using only NiFi’s built-in processors (such as InvokeHttp) we have made a few custom NiFi processors to make working with Idyl E3 easier.

NiFi Processors

[table “26” not found /]

Idyl E3 NiFi Processor

This NiFi processor provides access to Idyl E3’s API. Powered by the Idyl E3 client Java SDK, this processor allows your pipeline to send text to Idyl E3 for entity extraction, annotation, sansitization, and ingest.

Entity Query Language (EQL) NiFi Processor

This NiFi processor allows you to execute EQL queries against extracted entities. The EQL queries act as filters – when one or more of the entities satisfies an EQL query the entity or entities is outputted from the processor. This processor allows you to execute an action, such as a notification, upon the extraction of a certain entity. For example, the EQL query select * from entities where text = "George Washington" will cause only entities whose text is “George Washington” to continue through the NiFi dataflow process.

NiFi Dataflow with Idyl E3

There are NiFi templates that utilize Idyl E3 available on  GitHub.

In the first template, the dataflow process will read files from the local file system. The contents of each file will be passed to an Idyl E3 NiFi processor where the contents will be sent to Idyl E3 for entity extraction. The entities then pass through an EQL NiFi processor. The entities that satisfy the EQL query will be split by the SplitJson processor and then be persisted to a MongoDB database. This template provides a complete ingest process for text consumption, entity extraction, filtering, and entity persistence.

In the sample template the EQL query is select * from entities allowing all entities to pass through and be stored in MongoDB.

Capturing Entities on Edge Devices

Apache MiNiFi

Apache MiNiFi is a subproject of NiFi that was created to capture data on edge devices and transfer it into a remote NiFi dataflow. Idyl E3 and Apache MiNiFi can be used to extract entities on edge devices in order to push the entity extraction down to the data sources. This allows us to avoid having a single Idyl E3 installation that serves an entire pipeline and allow for increased throughput. The Idyl E3 NiFi Processor and the EQL Processor can both be used in a MiNiFi flow to extract entities and transfer them to a NiFi dataflow.

Entity Extraction from Natural Language Text in a Data Flow Pipeline