Phirestream removes sensitive information from Apache Kafka streams.

Phirestream FAQ

Frequently asked questions about Phirestream. For any questions not answered here please contact us.

What is Phirestream?

Phirestream is an application that filters sensitive information from data prior to that data being published to Apache Kafka. Phirestream works by acting as a type of proxy in front of Apache Kafka. When Phirestream receives data, Phirestream processes the data to redact, remove, or encrypt the types of sensitive information you have defined in Phirestream’s settings. Phirestream then publishes the filtered text to your Apache Kafka brokers.

What types of sensitive information can Phirestream identify?

Phirestream can identify the following types of information:

Philter can currently identify:  Ages, Bitcoin Addresses, Cities, Counties, Credit Cards, Custom Dictionaries, Custom Identifiers (medical record numbers, financial transaction numbers), Dates, Drivers License Numbers, Email Addresses, IBAN Codes, IP Addresses, MAC Addresses, Passport Numbers, Persons' Names, Phone/Fax Numbers, SSNs and TINs, Shipping Tracking Numbers, States, URLs, VINs, Zip Codes

How does Phirestream know what kinds of sensitive information to find?

A filter profile is a small file in which you describe the types of sensitive information you want to identify. You can have multiple filter profiles and can select which one to apply each time text is sent to Phirestream.

How do I deploy Phirestream?

Phirestream can be deployed in AWS, Azure, and Google Cloud with just a few clicks. Click here to get started.

Is Phirestream guaranteed to find 100% of all sensitive information in my text?

Phirestream uses state of the art natural language processing (NLP) technology to identify sensitive information in text. These NLP methods use trained models created from a large corpus of text. The process of applying the model to text is non-deterministic. There are many factors that could affect the identification of sensitive information in your text such as how similar your text is to the corpus that was used to train the model, how the text is formatted, and the length of the text. For these reasons, it is important that you assess Phirestream’s performance prior to utilization in a production system.

The confidence value in the filter strategy condition can be used to tune the NLP engine’s detection. Each identified entity has an associated confidence score between 0 and 100 indicating the model’s estimate that the text is actually an entity, with 0 being the lowest confidence and 100 being the highest confidence. The confidence value in the filter strategy allows you to filter out entities based on the confidence. For example, the condition confidence > 75 means that entities having less than a 75 confidence value will be ignored and entities having a confidence value greater than 75 will be filtered from the text.