We are happy to announce the release of Philter 1.2.0. This version brings new features to filter profiles along with some minor changes. Philter 1.2.0 will be available on the cloud marketplaces in a day or two. Let’s get to it and see what’s new!
Philter is an application to analyze text for potentially identifiable information (PII) and protected health information (PHI) and remove or manipulate those items when found. The types of information that Philter looks for and how it acts upon the information is called a filter profile. A filter profile is just a file that lists the types of PII/PHI that you are interested in, e.g. credit card numbers, persons names, etc. Philter is available on the AWS Marketplace, Azure Marketplace, GCP Marketplace.
What’s New in Philter 1.2.0
Filter Specific Ignore Lists
A filter profile can now have lists of ignored terms specific to each filter type. For example, let’s say there is a number “123-45-6789” in your text and it keeps getting identified as an SSN because it fits the SSN format. However, you know this number is not an SSN and do not want it removed. You can now add “123-45-6789” to a list of ignored terms for the SSN filter to prevent it from being removed from the text. Each type of filter has its own ignore list.
Global Ignore Lists
A filter profile can now have zero or more ignore lists that apply to all filter types. Items added to this list are ignored for all filter types. All items present in the global ignore lists will never be removed from the input text.
Previously, to disable a filter type in a filter profile you had to delete it from the filter profile. This can be problematic because you might have configuration in there you don’t want to just delete and lose. New in Philter 1.2.0, each filter type has an enabled property that controls whether or not the filter is applied. When set to false the filter is not applied. The default value is always true to enable each filter type.
Invalid Credit Card Numbers
Philter identifies credit card numbers based on the patterns and algorithms of the numbers. In Philter 1.2.0, a new option was added to the credit card filter type that allows invalid credit card numbers to be filtered as well. An invalid credit card number is a number that matches the pattern of a credit card number but fails the credit card number’s generation algorithm. (The algorithm is the Luhn algorithm.) This option is disabled by default.
Philter identifies dates based on date patterns. Sometimes, a date may match a valid pattern but not be a valid date, such as February 30 or even March 45. Philter 1.2.0 adds a new option to the date filter to require that identified dates be valid dates. When enabled, dates found to not be valid dates are not removed from the text. This option is disabled by default.
Option to Remove Punctuation
Philter 1.2.0 adds a new option to the filter profile for named-entity recognition to remove punctuation from the input text prior to processing the text. By default this option is disabled and punctuation is not removed. Removing punctuation can be beneficial in cases where punctuation is being included in entities. This can happen in cases where the last word of the sentence is a name and the period is included in the filtered text. (This doesn’t always happen and we’re working on removing those occurrences even more through improvements to the named-entity recognition capability.)
Encrypting Connections to Redis
Philter’s consistent anonymization feature stores the identified text in a Redis cache. This allows a clustered Philter installation to be able to replace identified text consistently across all instances of Philter. (When Redis is not used, the identified text values are stored in memory on each Philter instance.) Philter 1.2.0 requires all connections to a Redis cache be encrypted and requires the use of a Redis auth token.