Filter sensitive information from text

Philter finds, identifies, and removes sensitive information, such as PHI and PII, from natural language text. Run it in the cloud or in containers.


Philter® Release Notes

This page contains the release notes for Philter showing what’s new, what’s changed, and any known outstanding issues. Please contact us for clarification or more information on any of the items listed on this page or to suggest new features that would be helpful to you!

The release notes on this page use the following notation:

  • “New” indicates a feature or capability that has been added to the version.
  • “Tweak” denotes a minor change to a feature or capability.
  • “Fix” describes a change to a feature or capability to rectify the expected and observed behaviors.

Version 1.7.0 – TBD

  • New: Added an experimental feature to accommodate requests to filter long text. We are looking for feedback on this initial functionality.  The new feature, when enabled, will split text into pieces based on new lines. The pieces are then processed and reassembled prior to being returned. Because of the splitting, the reassembled text will likely not be an exact match of the input text due to white space differences. If maintaining the format of the input text through the filtering process is important to you then the best course of action is to handle the splitting client side so you have control over it.
  • New: Terms can now be ignored based on regular expression patterns. Previously Philter had the ability to ignore specified terms but the terms had to match exactly. Now you can specify terms to ignore via regular expression patterns. An example use of this new feature is to ignore non-sensitive information that can change such as timestamps in log messages.
  • New: Added ability to read ignored terms from files outside of the filter profile.
  • New: Custom dictionary terms can now be phrases or multi-term keywords.
  • New: Added “classification” condition to Identifier filter to allow for writing conditionals against the classification value.
  • New: Added configurable timeout values to allow for modifying timeouts of internal Philter communication. This can help when processing larger amounts of text. See the Settings for more information.
  • New: Added option to IBAN Code filter to allow spaces in the IBAN codes.
  • New: Ignore lists for individual filters are no longer case-sensitive. (“John” will be ignored for “JOHN.”)
  • Fix: Fixed IBAN Code validation to fix issue where sometimes an invalid IBAN Code would validate.
  • Fix: Changes to improve performance when handling long input text.
  • Tweak: Updated base AWS AMI.
  • Tweak: Updated base Docker container to UBI 8.2.

Version – August 31, 2020

This version ( of Philter will only be available on the Google Cloud Marketplace.

  • Updated the base image.

Version – August 24, 2020

This version ( of Philter will only be available on the AWS Marketplace.

  • Updated the base image.

Version – July 29, 2020

This version ( of Philter will only be available on the Google Cloud Marketplace.

  • Updated the base image.

Version 1.6.1 – July 8, 2020

This version (1.6.1) of Philter will only be available on the Microsoft Azure Marketplace.

  • Updated the Microsoft Azure base OS from CentOS 7.7 to CentOS 8.2.

Version 1.6.0 – June 9, 2020

Release Announcement Post

Version 1.6.0 brings many new features and enhancements. Some of these changes may impact your existing filter profiles. Please contact us for assistance if you encounter any difficulties adapting your filter profile for 1.6.0.

  • New: Added ability to generate alerts when a filter strategy condition is met. Generated alerts are available through Philter’s API. Use alerts to trigger when certain sensitive information, such as a name, is identified.
  • New: Added a “span disambiguation” feature that disambiguates identified sensitive information for identical text. For example, if the text “123456789” is identified both as a phone number and an SSN, the span disambiguation will determine whether “123456789” more closely resembles a phone number or an SSN based on previously filtered text. The feature is optional and is disabled by default. Learn more about it in the User’s Guide.
  • New: Philter configuration properties can be set through environment variables.
  • New: Added Bitcoin address filter.
  • New: Added IBAN code filter.
  • New: Added tracking number filter for FedEx, UPS, and USPS.
  • New: Added a new replacement strategy to replace sensitive information with its SHA-256 hash value.
  • New: Added support for connecting to a Redis cache with a self-signed SSL certificate.
  • New: Added filter condition for “classification.” A classification can be an entity type such as “PER” for person, or it can be a passport country such as “US”, or a state for a driver’s license such as “CA”. A classification is to give a more granular description of some sensitive information.
  • New: Added “fuzzy” property to custom dictionary filter. When set to “true”, the filter will allow for searching for loose matching. When set to false, terms must appear as listed in the dictionary. Most cases will see a significant performance improvement when setting to “fuzzy” to false when not needed.
  • New: Added “files” property to custom dictionary filter. Now you can list custom dictionary terms in a file outside of the filter profile. This property is a list meaning you can specify multiple files. You must provide the full path to the file on the local file system.
  • Tweak: Changed “type” to “classification” for filters in a filter profile.
  • Fix: Fixed issue where MAC address filter strategies may not be loaded correctly.
  • Fix: Fixed issue where custom dictionary terms that are set to be ignored may not be ignored correctly.
  • Fix: Fixed issue where valid credit card numbers may be determined to be invalid. (Only affects when the credit card filter has verification enabled.)

Version 1.5.2 – May 20, 2020

Version 1.5.1 – May 8, 2020

  • There were no feature changes for this version and no need to upgrade from 1.5.0.
  • The changes were to allow Philter to run on RHEL8 on AWS.

Version 1.5.0 – May 1, 2020

  • New: Added new filter called “Section” to identify text between two markers.
  • New: Added ability to use custom NLP models.
  • New: Added ability to store filter profiles in an Amazon S3 bucket. This allows multiple instances of Philter to use the same filter profiles.
  • New: Added CloudFormation template and Terraform scripts to philter-infrastructure-as-code repository.
  • Tweak: Consolidated caches into a single cache.
  • Tweak: Model file can now be specified in the application properties.
  • Tweak: An error is generated at startup if API authentication is enabled but no API token is set.

Version 1.4.0 – April 10, 2020

  • New: Added optional basic authentication.
  • New: Added token condition to NerFilterStrategy. Can now write a condition on the token itself.
  • New: Added confidence condition to each type of filter strategy.
  • Tweak: Ignored spans are now dropped prior to overlapping spans.
  • Tweak: Docker container now uses Java 11.
  • Fix: Fixed potential issue with filtering state abbreviations.

Version 1.3.1 – February 20, 2020

Release Announcement Post

  • New: Added CRYPTO_REPLACE redaction option to encrypt sensitive values.
  • New: Added %v redaction variable to be substituted for the original value of the sensitive text. With %v you can now annotate sensitive information instead of masking or removing it.
  • New: Added filter condition based on the context. You can now make a filter condition be dependent on the value of the context.
  • New: Added filter for network MAC addresses.
  • New: Added support for TINs (Tax Identification Numbers) to the SSN filter.
  • New: Now requires Java 11.
  • New: Client can set document ID per filter request instead of document ID always being auto-generated per request. This allows for splitting documents between multiple requests to increase throughput.
  • New: Philter Enterprise Edition is now certified for Red Hat Enterprise Linux 8.
  • Tweak: GCP image is now built on CentOS 8.
  • Tweak: Credit card filter now supports credit card numbers containing dashes and spaces.

Version 1.3.0 – January 28, 2020

Release Announcement Post

This release focuses mainly on improving performance and error handling. No new functionality was added.

  • New: Now supports identifying URLs that use an IP address instead of a domain name.
  • New: Added option to URL filtering to require an URL to begin with http, https, or www.
  • Tweak: Removed trailing spaces from filtered values when they exist.
  • Tweak: Improving performance on API requests.
  • Tweak: Improving performance for larger documents.
  • Tweak: Changing format of generated document ID to be more random.
  • Tweak: Improved error handling if an API request to filter is not successful.
  • Tweak: Improved handling of just month names.
  • Tweak: When no filter strategies are specified, the default action will be to redact.

Version 1.2.0 – January 16, 2020

Release Announcement Post

  • New: Added ignore lists specific to each filter to list items that should never be removed. Each filter can have its own ignore list.
  • New: Added support for encrypted connections to Redis.
  • New: Added enabled property to individual filters in a filter profile. Filters having enabled=false will not be executed.
  • New: Added option to filter profile credit cards to also include invalid credit card numbers. (Credit card numbers that match the pattern but are not valid per the card’s number algorithm.)
  • New: Added option to filter profile to require dates be valid dates. (The date February 30 is not a valid date and would be excluded when enabled.)
  • New: Added option to filter profile for NER to remove punctuation prior to processing.
  • Fix: Fixed issue where conditionals may not be applied to NER entities.
  • Tweak: Added Philter version to status API response.

Version 1.1.0 – December 15, 2019

Release Announcement Post

  • New: Store changed from MongoDB to Elasticsearch for improved querying capabilities.
  • New: Added “auto” setting for distance to automatically calculate appropriate distance (fuzziness) of identified text.
  • New: Added ignore lists to filter profiles to support having a list of terms that are always not filtered.
  • New: Added support for using custom dictionaries in filter profiles. (Can now specify your own list of terms to be filtered.)
  • New: Added an explanation endpoint that describes how the identified PII/PHI was detected and filtered.
  • New: Added metrics per individual filter type.
  • New: Added “prefix” property for metrics to allow for improved metric organization.
  • New: Applying filter sensitivity level to NER entities.
  • New: Added API for managing filter profiles.
  • Fix: Fixed filter profile issue where appropriate filtering strategy may not be applied.

Version 1.0.1 – October 19, 2019

  • Tweak: Changed API HTTP response message when Philter is initializing.
  • Tweak: API endpoint /api/replacements returns HTTP 503 Service Unavailable when the replacement store is not enabled.
  • Improvement: Updated how identified spans are located.

Version 1.0.0 – October 7, 2019

  • Initial public release.
  • Known issue: Philter’s API /api/filter endpoint will return HTTP 500 if Philter has not finished initializing. This will be made more user-friendly in a later version. As a workaround, use the /api/status endpoint to determine if Philter has finished initializing prior to calling /api/filter.