Phirestream

Phirestream finds and removes sensitive information from Apache Kafka and Apache Pulsar streaming data.
 

Phirestream Features

Phirestream has many features to help safeguard sensitive information in your streaming text.

Identifying Sensitive Information in Text

Many types of sensitive information

Philter can currently identify:  Ages, Bitcoin Addresses, Cities, Counties, Credit Cards, Custom Dictionaries, Custom Identifiers (medical record numbers, financial transaction numbers), Dates, Drivers License Numbers, Email Addresses, IBAN Codes, IP Addresses, MAC Addresses, Passport Numbers, Persons' Names, Phone/Fax Numbers, SSNs and TINs, Shipping Tracking Numbers, States, URLs, VINs, Zip Codes

Tailored for different domains

Phirestream’s NLP model can be changed based on your domain and use-case to offer increased performance. We offer specialized models for working with healthcare and COVID-19 text. We are constantly improving our models to provide the best possible performance.

Disambiguate the types of sensitive information

Some sensitive information can belong to multiple types, such as phone numbers and SSNs. Phirestream can disambiguate the sensitive information and determine the best type when a conflict occurs.

Filter profiles provide flexibility

Filter profiles are how you tell Philter what to do. Philter provides flexibility by allowing an unlimited number of filter profiles and each filter profile can be customized individually.

State-of-the-art NLP

Philter uses state-of-the-art natural language processing (NLP) to analyze text. Using trained models, Philter can identify person’s names in many text across many domains. You can also use your own NLP models with Philter.

Simple API

Philter’s API for filtering sensitive information from text is simple. Use it from one of our open-source SDKs for Java, .NET, or Go or integrate it with your scripts using a tool like curl.

Redacting and Manipulating Sensitive Information

Customizable redaction logic

You can apply different redaction logic based on conditions. Philter can redact sensitive information based on conditions such as the population of a zip code or the content and type of the sensitive information.

Redact and replace with realistic values

Keep your documents useful! Philter can replace sensitive information with similar but random values so documents can remain useful for secondary purposes. Philter can generate random names, phone numbers, and more.

Encrypt sensitive information

When found, sensitive information can be encrypted or replaced by a SHA-256 hash value keeping the original values secure.

Consistent replacement

Philter can maintain the meaning of documents by consistently replacing values with the same values. For instance, if the same name appears across multiple documents Philter can replace that name with the same randomly generated name each time.

Managing Sensitive Information

Store sensitive information externally

When sensitive information is found in text it can optionally be stored to an external store (Elasticsearch) for easy analysis and searching.

Generate alerts

Philter can generate alerts when certain information is found. If sensitive information is found that satisfies a condition you provide an alert will be generated. Use alerts to be aware of what information is in your text.

Monitoring and Metrics

Capture metrics

Philter can generate metrics while it is analyzing your text. Metrics can be published to Amazon CloudWatch, DataDog, and exposed via JMX. The metrics show the counts and types of sensitive information in your text.

Monitor Philter’s health via the API

Philter’s API includes a health endpoint for convenient monitoring. This API endpoint is suitable for monitoring by up-time services and health checks by cloud services such as Amazon Elastic Load Balancing.