Philter 1.7.0

PhilterWe are happy to announce that Philter 1.7.0 has been released and is currently being published to the DockerHub and the AWS, Azure, and Google Cloud marketplaces. Look for it to be available for deployment into your cloud in the next couple of days.

Click here to deploy Philter in your cloud of choice!

Philter finds and removes sensitive information, such as PII and PHI, in text. Philter can be integrated with virtually any platform, such as Apache Kafka, Apache Flink, Apache NiFi, Apache Pulsar, and Amazon Kinesis. Philter can redact, replace, encrypt, and hash sensitive information.

Philter is capable of redacting:  Ages, Bitcoin Addresses, Cities, Counties, Credit Cards, Custom Dictionaries, Custom Identifiers (medical record numbers, financial transaction numbers), Dates, Drivers License Numbers, Email Addresses, IBAN Codes, IP Addresses, MAC Addresses, Passport Numbers, Persons' Names, Phone/Fax Numbers, SSNs and TINs, Shipping Tracking Numbers, States, URLs, VINs, Zip Codes
Philter Version
Launch Philter on AWS2.1.0
Launch Philter on Azure2.1.0
Launch Philter on Google Cloud2.1.0

What’s New in Philter 1.7.0?

Philter 1.7.0 brings a new experimental feature that breaks large text into smaller pieces of text for more efficient processing. This new feature is described below and is introduced in Philter 1.7.0 as an experimental feature. We welcome and encourage your feedback on the feature but caution you that the feature may undergo major changes in future versions.

Some of the changes and new features in Philter 1.7.0 are described below. Refer to the Release History for a full list of changes.

Automatically Splitting Input Text

Philter 1.7.0 bring a new experimental feature that breaks long input text up into pieces and processed each piece individually. After processing, Philter combines the individual results into a single response back to the client. The purpose of this feature is to allow Philter to better handle long input text.

What is a “long” input text can depend on several factors, such as the hardware running Philter, the network, and the density of sensitive information in the text. Because of this, you have some control over how Philter breaks long text into separate pieces. You can choose between two methods of splitting. The first method splits the text based on the locations of new line characters in the text. The second method splits the text into individual lines of nearly equal length.

The alternative to allowing Philter to split the text is to split the text yourself client side prior to sending the text to Philter. When doing the split client side you have full control over how the text is split. On the flip side, you also have to handle the individual response for each split, something Philter handles for you when you delegate the splitting to Philter.

Input text splitting is enabled and configured in filter profiles. This allows you to configure splitting based on individual filter profiles allowing some text to be split and other text not split based on the chosen filter profile for the text.

See Philter’s User’s Guide for how to configure splitting in a filter profile.

If you use this feature please send us feedback. We are looking to improve it for future versions and value your feedback. Please see the User’s Guide for more details.

Reporting Metrics via Prometheus

Philter supported metrics reporting via JMX, Amazon CloudWatch, and Datadog. In Philter 1.7.0 we added support for monitoring Philter’s metrics via Prometheus. When enabled, Philter will expose an HTTP endpoint suitable for scraping by Prometheus. See Philter’s Settings for details on how to enable the Prometheus metrics. Look for a separate blog post soon that dives into monitoring Philter’s metrics with Prometheus.

Smaller AWS EBS Volume

The EBS volume size for Philter 1.7.0 has been reduced from 20 GB to 8 GB. This reduces the monthly cost by $1.20 for Philter by only requiring a smaller SSD volume. This cost may or may not seem trivial, but when multiple Philter instances are deployed the savings will add up.

Other Changes

Other new features in Philter 1.7.0 include:

  • Terms can now be ignored based on regular expression patterns. Previously Philter had the ability to ignore specified terms but the terms had to match exactly. Now you can specify terms to ignore via regular expression patterns. An example use of this new feature is to ignore non-sensitive information that can change such as timestamps in log messages.
  • Added ability to read ignored terms from files outside of the filter profile.
  • Custom dictionary terms can now be phrases or multi-term keywords.
  • Added “classification” condition to Identifier filter to allow for writing conditionals against the classification value.
  • Added configurable timeout values to allow for modifying timeouts of internal Philter communication. This can help when processing larger amounts of text. See the Settings for more information.
  • Added option to IBAN Code filter to allow spaces in the IBAN codes.
  • Ignore lists for individual filters are no longer case-sensitive. (“John” will be ignored for “JOHN.”)