Sneak Peek at Philter Big-Data and ETL Integrations

As we are nearing the general availability of Philter we would like to take a minute to offer a quick look at Philter’s integrations with other applications. Philter offers integration capabilities with Apache NiFi, Apache Kafka, and Apache Pulsar to provide PHI/PII filtering capabilities across your big-data and ETL ecosystems. We are very excited to offer these integrations for such awesome and popular open source applications.

To recap, Philter is an application to identify and, optionally, remove or replace protected health information (PHI) and personally identifiable information (PII) from natural language text.

Apache NiFi

Philter provides a custom Apache NiFi processor NAR that you can plug into your existing NiFi installations by copying the NAR file to NiFi’s lib directory. The processor allows your NiFi flow to identify and replace PHI and PII directly in your flow without any required external services. The processor’s configuration is similar to Philter’s standard configuration. The processor accepts a filter profile, an optional MongoDB URI to use to store replaced values, and a cache to maintain state when anonymizing values consistently. For the cache, the processor utilizes NiFi’s built-in DistributedMapCacheServer.

The processor operates on the content of the incoming flowfile by performing filtering on the content and replacing the content with the filtered text. An outbound transition provides the downstream processors with the filtered text.

Apache Kafka

Philter is able to integrate with Apache Kafka by providing the ability to consume text from Kafka, perform the filtering, and publish the filtered text to a different Kafka topic. Philter does this in a performant and fault tolerant manner by leveraging the Apache Flink streaming framework. This integration is suitable for integration into existing pipelines where text is being consumed from Kafka for processing because it requires minimal changes to the pipeline. Simple provide the appropriate configuration values to the Philter job and update your topic names.

Apache Pulsar

Philter integrates with Apache Pulsar via Pulsar Functions. A Pulsar Function enables Pulsar to execute functions on the streaming data as it passes through Pulsar. Pulsar is similar to Kafka in its functionality as a massive pub/sub application but unlike Kafka it provides the ability to directly transform the data inside of the application. This is an ideal integration point for Philter and your streaming architectures using Apache Pulsar.