Frequently asked questions about Philter.
What is Philter?
Philter is an application that identifies and masks Personally Identifiable Information (PII) and Protected Health Information (PHI) from text. When you send text to Philter, Philter finds the PII and PHI in the text and either redacts it or replaces it with similar, but random and fake, values.
How does Philter know what kinds of PII and PHI to find?
Philter uses what we call filter profiles. A filter profile is a file that you give to Philter to tell it the types of PII and PHI you are interested in removing. A filter profile lists the types of PII and PHI, when to remove them, and how to remove them. Filter profiles are detailed in Philter’s User’s Guide. You can have as many filter profiles as you need to and you can select which one to use dynamically when submitting text to Philter.
How do I send text to Philter for processing?
Philter’s HTTP-based API accepts text to process and returns the processed text. Philter’s API allows it to be integrated into many types of systems and processes. See the API in Philter’s User Guide for more information, but here’s an example to send a text file to Philter for processing:
url -k -X POST "https://localhost:8080/api/filter?c=context" -d @file.txt -H Content-Type "text/plain"
Is Philter guaranteed to find 100% of all PHI and PII in my text?
Philter uses state of the art natural language processing (NLP) technology to identify PHI and PII in text. These NLP methods use trained models created from a large corpus of text. The process of applying the model to text is non-deterministic. There are many factors that could affect the identification of PHI and PII in your text such as how similar your text is to the corpus that was used to train the model, how the text is formatted, and the length of the text. For these reasons, it is important that you assess Philter’s performance prior to utilization in a production system.
confidence value in the filter strategy condition can be used to tune the NLP engine’s detection. Each identified entity has an associated confidence score between 0 and 100 indicating the model’s estimate that the text is actually an entity, with 0 being the lowest confidence and 100 being the highest confidence. The confidence value in the filter strategy allows you to filter out entities based on the confidence. For example, the condition
confidence > 75 means that entities having less than a 75 confidence value will be ignored and entities having a confidence value greater than 75 will be filtered from the text.
How are the Standard and Enterprise Edition different?
Philter Enterprise Edition includes native integration with several big-data applications to offer improved support for streaming data and data pipelines. Philter Enterprise Edition supports reading and writing text from Apache Kafka, deep integration with Apache NiFi, and integration with Apache Pulsar. A full comparison is available on Philter’s home page.
How do I get Philter?
How does Philter compare to Phinder?
Phinder and Philter are complementary applications to manage protected health information. Philter identifies and removes PHI from text in motion, such as streaming text. Phinder analyzes your data lake and other storage locations to locate PII and PHI in data at rest.