Philter – A Real-World Use-Case

PhilterPhilter finds, identifies, and removes sensitive information from text. That’s a very good and short description of Philter, but, as they say, a picture is worth a thousand words. In this post we will detail an actual, real-world use-case of Philter as we paint a picture with words!

“Super Helpdesk”

The Philter customer, we’ll call them Super Helpdesk, is a provider of a software-as-a-service helpesk solution. Their customers sign-up to be able to offer a helpdesk to their customers. (Following? :) Super Helpdesk’s users need the ability to optionally prevent sensitive information from being passed through in tickets. If a customer enters something sensitive they want to remove it from the ticket before the ticket enters the workflow.

In this case, the sensitive information Super Helpdesk is most worried about are credit card numbers. Due to security best practices and regulations like PCI-DSS, credit card numbers cannot exist in helpdesk tickets where they may be stored or transmitted unencrypted. Super Helpdesk needed a way to analyze the tickets entering their system in order to filter out the credit card numbers from the tickets.

The Solution

At a high-level, Super Helpdesk deployed Philter (in this case running on EC2 in AWS) to perform the filtering of the content of the helpdesk tickets. As new helpdesk tickets are submitted, the content of the ticket is sent to Philter and Philter immediately returns the content of the ticket with the credit card numbers redacted to just the last four digits. (Super Helpdesk also added an option for their users to control how Philter redacts the credit card numbers, with the available options being redact all or redact all but the last four digits.)

Now for the low-level implementation details! When new helpdesk tickets come in they are published to an Apache Kafka topic. A process consumes from the topic, does processing on the ticket, and ultimately inserts the ticket into a backend database. This process, written in Java, was modified to make use of the Philter Java SDK to enable the communication between the process and Philter.

We have found this to actually be a very common Extract-Transform-Load (ETL) design scenario across industries. Data in the form of text flows from an external system through a pipeline facilitated by Apache Kafka or Amazon Kinesis Firehose into an internal database. Along the way the data needs to be manipulated in some manner. In our case the data manipulation is to remove sensitive information from the text. Philter’s API allows it to slide nearly seamlessly into the existing pipeline. Like Super Helpdesk did, just insert a step to send the text to Philter for filtering.

We made a previous blog post about using Philter inside of an AWS Kinesis Firehose using a Firehose Transformation. It describes how to make a Lambda function to invoke Philter on the text going through the pipeline to filter the text. Check it out at the link below.

Using AWS Kinesis Firehose Transformations to Filter Sensitive Information from Streaming Text

But, wait, why Philter?

You are probably saying, well, that seems like overkill for a simple problem to redact credit card numbers! Credit card numbers follow a well-defined pattern so why not just use a regular expression to find them? If all you want to do is find credit card numbers then a regular expression definitely may work.

So what does using Philter give us? A good bit actually. Through the use of filter profiles, Philter can have a pre-set list of types sensitive information. Each type of sensitive information can have its own redaction logic. For example, you could redact VISA card numbers while truncating AMEX card numbers. Or, you could only leave the last four digits of card numbers matching a condition. Additionally, each customer of the helpdesk platform may have different requirements around sensitive information. That logic can also be encapsulated in filter profiles. The regular expression logic just got more complicated.

Philter provides other features as well, such as the ability to capture metrics on the data, ability to encrypt the credit card numbers instead of removing them, and the ability to disambiguate between different types of sensitive information.

Lastly, a regular expression will never be able to find non-deterministic types of sensitive information like person’s names. Philter’s natural language processing (NLP) capabilities are able to find entities like person’s names that do not follow any set pattern.

Try Philter

Deploying Philter to AWS, Azure, or GCP is easy because Philter is available through each of the cloud’s marketplaces. Simply follow the marketplace steps to launch an instance of Philter in your private cloud.

 Philter Version 
Launch Philter on AWS1.6.0
Launch Philter on Azure1.6.1
Launch Philter on Google Cloud1.6.0

Share your experience!

We would love to hear how you are using Philter. Share your experience with us!


Jeff Zemerick is the founder of Mountain Fog. He is a 10x certified AWS engineer, current chair of the Apache OpenNLP project, and experienced software engineer.

You can contact Jeff at jeff.zemerick@mtnfog.com or on LinkedIn.