Philter offers two modes of operation for filtering text. The first is via Philter’s REST API. This method is useful for batch processing scenarios or when Philter needs to be integrated into an existing process or workflow. The second mode is performing filtering on streaming text. This mode utilizes Apache Kafka to process streaming text.
Methods of Filtering
Filtering via the REST API
This method of filtering allows you to submit text to Philter to be filtered. The response is the filtered text. This method is the most flexible because it allows Philter to be integrated with virtually any existing systems or processes. See the details of the API.
Filtering via Streaming
This method of filtering allows Philter to subscribe to an Apache Kafka topic. Philter consumes text from the topic, filters it, and places the filtered text back onto Apache Kafka in a different topic. Philter’s streaming is designed to run on a YARN cluster.
This method is most performant but it is more restricted than filtering via the REST API. When streaming, each Kafka topic is treated as its own context, contrasted to the REST API in which each request to filter can have its context set individually.
When filtering via streaming you can choose the format of the messages that are published to the Kafka topic. The published messages can either simply be the filtered text or the published messages can be JSON. When JSON is chosen, the message structure will be as follows:
"filteredText": "The filtered text will be here.",
"context": "The context (incoming Kafka topic name) will be here.",
"documentId": "The assigned document ID will be here."
The PII and PHI items identified by Philter are assigned “weight” values. The sum of these values for a given document indicate the amount of PHI/PII in the document. The weights can be customized per filter and you can choose the weight values based on your use-case. See Philter’s Configuration for how to adjust the filter weights.