Idyl E3 in Big-Data Environments with Apache NiFi

This page illustrates how to use Idyl E3 Entity Extraction Engine in big-data environments with Apache NiFi to perform entity extraction.

Apache NiFi is a popular application that makes complex data processing and migration simple through the creation of dataflow pipelines. We can utilize Idyl E3 to perform entity extraction inside of a NiFi pipeline. For example, a NiFi pipeline can be created that processes files from the file system, sends the content to Idyl E3 for entity extraction, and then persists the extracted entities to a database such as MongoDB.

The components of NiFi that perform the work are called processors. While you can accomplish the workflow described above using only NiFi’s built-in processors (such as InvokeHttp) we have made a few custom NiFi processors to make working with Idyl E3 easier.

NiFi Processors

Idyl E3 NiFi Processor

This NiFi processor provides access to Idyl E3’s API. Powered by the Idyl E3 client Java SDK, this processor allows your pipeline to send text to Idyl E3 for entity extraction, annotation, sensitization, and ingest.

  • The Idyl E3 NiFi Processor is open source and available on GitHub.
  • Download the Idyl E3 NiFi processor.
Entity Query Language (EQL) NiFi Processor

This NiFi processor allows you to execute EQL queries against extracted entities. The EQL queries act as filters – when one or more of the entities satisfies an EQL query the entity or entities is outputted from the processor. This processor allows you to execute an action, such as a notification, upon the extraction of a certain entity. For example, the EQL query select * from entities where text = "George Washington" will cause only entities whose text is “George Washington” to continue through the NiFi dataflow process.

  • The EQL NiFi Processor is open source and available on GitHub.
  • Download the EQL NiFi processor.

NiFi Dataflow with Idyl E3

There are NiFi templates that utilize Idyl E3 available on  GitHub.

In the first template, the dataflow process will read files from the local file system. The contents of each file will be passed to an Idyl E3 NiFi processor where the contents will be sent to Idyl E3 for entity extraction. The entities then pass through an EQL NiFi processor. The entities that satisfy the EQL query will be split by the SplitJson processor and then be persisted to a MongoDB database. This template provides a complete ingest process for text consumption, entity extraction, filtering, and entity persistence.

In the sample template the EQL query is select * from entities allowing all entities to pass through and be stored in MongoDB.