Using Philter with Apache NiFi for Data Flow PHI Filtering

This article describes how Philter can be used with Apache NiFi to perform PHI filtering as part of a NiFi data flow pipeline.

Integrating Philter with an Apache NiFi Flow

To integrate Philter with Apache NiFi we will make use of Philter’s REST API to process text. The NiFi flow will send text to Philter for processing and Philter will return the filtered text. We will use NiFi’s InvokeHTTP processor for making the API call to Philter. (Philter REST API is enabled by default in Philter’s configuration.) In this sample process we are using Apache Kafka to manage the incoming and outgoing streaming text.

Example NiFi Pipeline

In the example pipeline shown below, the the text to be processed has been previously pushed to an Apache Kafka cluster. An ConsumeKafka processor is then used to consume the text from Kafka. An InvokeHTTP processor then sends the text to Philter via Philter’s REST API. Philter responds with the filtered text which NiFi then puts onto a separate Kafka topic via the PutKafka topic. When complete, we have two topics on Kafka – the first topic contains the text containing PHI to process and the second topic contains the text that has been processed by Philter.

This integration does not require any NiFi processors outside of the processors that are included with the standard NiFi distribution, ensuring compatibility across deployments.

Processor configurations:

 

Considerations

We are using a single instance of Philter in this article. For a production environment, a cluster of Philter instances deployed behind a load balancer would provide improved performance. The only change to the NiFi flow configuration would be to change the InvokeHTTP processor’s Remote URL to point to the load balancer instead of an individual instance of Philter.