Philter is a featured product in the AWS Marketplace's Healthcare Compliance category

We are excited to share the Philter is now a featured product in the AWS Marketplace's Healthcare Compliance category.

The products selected for this feature “help ensure that IT infrastructure is compliant with changing policies and regulations, allowing teams to focus on driving patient-centeric innovation.”

 Philter on AWS for Healthcare Compliance Data Sheet

Philter redacts PHI, PII, and other sensitive information from documents and text. With Philter, users can select the types of sensitive information to redact, anonymize, encrypt, or tokenize.

Philter can be launched in your AWS cloud via the AWS Marketplace in just a few minutes. Philter runs entirely within your private VPC so your sensitive data never has to leave your VPC.


Using Philter in the AWS Reference Architecture for HIPAA

Philter is a featured solution in the AWS Marketplace Healthcare Compliance category!

AWS has provided a HIPAA Reference Architecture for applications that contain protected health information (PHI). This reference architecture gives us a starting point for a highly-available architecture that spans multiple availability zones, three VPCs for management, production, and development resources, logging, VPN for customer connectivity, along with a set of AWS config rules and logging. The source code for the reference architecture is available.

Philter is our software that redacts PHI, PII, and other sensitive information from text and documents. Philter runs in your cloud so your data never leaves your network to be redacted and its API allows for integration into virtually any application or system. In this blog post we will look at how Philter can be deployed via the AWS Marketplace inside an architecture developed using the AWS HIPAA Reference Architecture.

AWS Reference Architecture for HIPAA with Philter

The image below shows the AWS Reference Architecture for HIPAA with Philter deployments.

HIPAA Reference Architecture with Philter
HIPAA Reference Architecture with Philter

Redacting PHI in your application

Your application has the requirement of redacting PHI prior to processing and you want to deploy Philter in this reference architecture. So how does Philter fit into this architecture? The answer is seamlessly. Philter can be deployed from the AWS Marketplace into one of the private subnets in the production VPC. From there, Philter’s API will be available to the rest of your application. Your application can now send data to Philter for redaction and receive back the redacted text. The VPC flow logs configured as part of the reference architecture will capture the network traffic and Philter’s application and system log can be sent to CloudWatch Logs.

If you want to customize the configuration of Philter you can create an AMI from the Philter AMI on the AWS Marketplace. This may be useful if you want to "bake in" configuration for sending logs to CloudWatch Logs or an organizational SSL certificate.

Highly-available Philter deployment

For a highly available Philter deployment, create an autoscaling group for the Philter AMI in the two private subnets of the production VPC. Create a load balancer in the production VPC and register the autoscaling group with it. Now, you have a single endpoint for Philter’s API with a load balanced, highly-available set of instances behind the load balancer. You can configure auto scaling policies if you would like the Philter instances to scale up or down based on network traffic (or some other metric).

Data encryption

Network traffic to Philter will be encrypted. By default Philter uses a self-signed certificate but you can replace it with a certificate for your organization. Also, when deploying the Philter instances be sure to do so using encrypted EBS volumes. These two items will give you encryption of data at rest and in motion for your Philter instances.

Development

You will also want to deploy an instance of Philter in the private subnet of the development VPC. This will give you an instance of Philter to use while developing and testing your application. This Philter instance can be a smaller instance type, such as a t3.large, to save cost.

Get started

To get started, deploy Philter from the AWS Marketplace.


Philter Managed Deployment

Philter Managed Deployment on the AWS Marketplace

Today we are excited to announce the availability of Philter Managed Deployment on the AWS Marketplace to allow you to quickly get started on deploying a pre-configured instance of Philter into a HIPAA-compliant AWS VPC.

Often, a challenge of using a product like Philter in the cloud is ensuring compliance to the requirements of HIPAA. With the Philter Managed Deployment, our team of AWS certified engineers will construct a HIPAA-compliant cloud architecture to support Philter and your document workload.

To get started visit the Philter Managed Deployment on the AWS Marketplace and click the Continue button. Complete the form and click Send Request to Seller. This does not obligate you to anything. We will receive your request and reach out to begin the conversation.


Using AWS Kinesis Firehose Transformations to Filter Sensitive Information from Streaming Text

  • Updated 07/12/2020 to include a link to a similar solution using log4j and Apache Kafka.
  • Updated 05/20/2020 to include a link to running Philter as a container and a link to the solution example.
  • Updated 04/28/2020 to include a link to CloudFormation and Terraform scripts and link to using a signed certificate with Philter.

AWS Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from places such as CloudWatch, AWS IoT, and custom applications using the AWS SDK to places such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, and others. In this post we will use S3 as the firehose's destination.

In some cases you may need to manipulate the data as it goes through the firehose to remove sensitive information. In this blog post we will show how AWS Kinesis Firehose and AWS Lambda can be used in conjunction with Philter to remove sensitive information (PII and PHI) from the text as it travels through the firehose.

Click here for a similar solution using log4j and Apache Kafka to remove sensitive information from application logs.

Prerequisites

Your must have a running instance of Philter. If you don't already have a running instance of Philter you can launch one through the AWS Marketplace or as a container. There are CloudFormation and Terraform scripts for launching a single instance of Philter or a load-balanced auto-scaled set of Philter instances.

It's not required that the instance of Philter be running in AWS but it is required that the instance of Philter be accessible from your AWS Lambda function. Running Philter and your AWS Lambda function in your own VPC allows you to communicate locally with Philter from the function.

Setting up the AWS Kinesis Firehose Transformation

There is no need to duplicate an excellent blog post on creating a Firehose Data Transformation with AWS Lambda. Instead, refer to the linked page and substitute the Python 3 code below for the code in that blog post.

Configuring the Firehose and the Lambda Function

To start, create an AWS Firehose and configure an AWS Lambda transformation. When creating the AWS Lambda function, select Python 3.7 and use the following code:

from botocore.vendored import requests
import base64

def handler(event, context):

    output = []

    for record in event['records']:
        payload=base64.b64decode(record["data"])
        headers = {'Content-type': 'text/plain'}
        r = requests.post("https://PHILTER_IP:8080/api/filter", verify=False, data=payload, headers=headers, timeout=20)
        filtered = r.text
        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': base64.b64encode(filtered.encode('utf-8') + b'\n').decode('utf-8')
        }
        output.append(output_record)

    return output

The following Kinesis Firehose test event can be used to test the function:

{
  "invocationId": "invocationIdExample",
  "deliveryStreamArn": "arn:aws:kinesis:EXAMPLE",
  "region": "us-east-1",
  "records": [
    {
      "recordId": "49546986683135544286507457936321625675700192471156785154",
      "approximateArrivalTimestamp": 1495072949453,
      "data": "R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg=="
    },
    {
      "recordId": "49546986683135544286507457936321625675700192471156785154",
      "approximateArrivalTimestamp": 1495072949453,
      "data": "R2VvcmdlIFdhc2hpbmd0b24gd2FzIHByZXNpZGVudCBhbmQgaGlzIHNzbiB3YXMgMTIzLTQ1LTY3ODkgYW5kIGhlIGxpdmVkIGF0IDkwMjEwLiBQYXRpZW50IGlkIDAwMDc2YSBhbmQgOTM4MjFhLiBIZSBpcyBvbiBiaW90aW4uIERpYWdub3NlZCB3aXRoIEEwMTAwLg=="
    }    
  ]
}

This test event contains 2 messages and the data for each is base 64 encoded, which is the value "He lived in 90210 and his SSN was 123-45-6789." When the test is executed the response will be:

[
  "He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.",
  "He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}."
]

When executing the test, the AWS Lambda function will extract the data from the requests in the firehose and submit each to Philter for filtering. The responses from each request will be returned from the function as a JSON list. Note that in our Python function we are ignoring Philter's self-signed certificate. It is recommended that you use a valid signed certificate for Philter.

When data is now published to the Kinesis Firehose stream, the data will be processed by the AWS Lambda function and Philter prior to exiting the firehose at its configured destination.

Processing Data

We can use the AWS CLI to publish data to our Kinesis Firehose stream called sensitive-text:

aws firehose put-record --delivery-stream-name sensitive-text --record "He lived in 90210 and his SSN was 123-45-6789."

Check the destination S3 bucket and you will have a single object with the following line:

He lived in {{{REDACTED-zip-code}}} and his SSN was {{{REDACTED-ssn}}}.

Conclusion

In this blog post we have created an AWS Firehose pipeline that uses an AWS Lambda function to remove PII and PHI from the text in the streaming pipeline.

Resources


Jeff Zemerick is the founder of Mountain Fog. He is a certified AWS and Google Cloud engineer many times over, current chair of the Apache OpenNLP project, and experienced software engineer. You can visit his website at https://jeffzemerick.dev, or contact Jeff at jeff.zemerick@mtnfog.com or on LinkedIn.