Using Philter in the AWS Reference Architecture for HIPAA

Philter is a featured solution in the AWS Marketplace Healthcare Compliance category!

AWS has provided a HIPAA Reference Architecture for applications that contain protected health information (PHI). This reference architecture gives us a starting point for a highly-available architecture that spans multiple availability zones, three VPCs for management, production, and development resources, logging, VPN for customer connectivity, along with a set of AWS config rules and logging. The source code for the reference architecture is available.

Philter is our software that redacts PHI, PII, and other sensitive information from text and documents. Philter runs in your cloud so your data never leaves your network to be redacted and its API allows for integration into virtually any application or system. In this blog post we will look at how Philter can be deployed via the AWS Marketplace inside an architecture developed using the AWS HIPAA Reference Architecture.

AWS Reference Architecture for HIPAA with Philter

The image below shows the AWS Reference Architecture for HIPAA with Philter deployments.

HIPAA Reference Architecture with Philter
HIPAA Reference Architecture with Philter

Redacting PHI in your application

Your application has the requirement of redacting PHI prior to processing and you want to deploy Philter in this reference architecture. So how does Philter fit into this architecture? The answer is seamlessly. Philter can be deployed from the AWS Marketplace into one of the private subnets in the production VPC. From there, Philter’s API will be available to the rest of your application. Your application can now send data to Philter for redaction and receive back the redacted text. The VPC flow logs configured as part of the reference architecture will capture the network traffic and Philter’s application and system log can be sent to CloudWatch Logs.

If you want to customize the configuration of Philter you can create an AMI from the Philter AMI on the AWS Marketplace. This may be useful if you want to "bake in" configuration for sending logs to CloudWatch Logs or an organizational SSL certificate.

Highly-available Philter deployment

For a highly available Philter deployment, create an autoscaling group for the Philter AMI in the two private subnets of the production VPC. Create a load balancer in the production VPC and register the autoscaling group with it. Now, you have a single endpoint for Philter’s API with a load balanced, highly-available set of instances behind the load balancer. You can configure auto scaling policies if you would like the Philter instances to scale up or down based on network traffic (or some other metric).

Data encryption

Network traffic to Philter will be encrypted. By default Philter uses a self-signed certificate but you can replace it with a certificate for your organization. Also, when deploying the Philter instances be sure to do so using encrypted EBS volumes. These two items will give you encryption of data at rest and in motion for your Philter instances.

Development

You will also want to deploy an instance of Philter in the private subnet of the development VPC. This will give you an instance of Philter to use while developing and testing your application. This Philter instance can be a smaller instance type, such as a t3.large, to save cost.

Get started

To get started, deploy Philter from the AWS Marketplace.


Philter Managed Deployment

Philter Managed Deployment on the AWS Marketplace

Today we are excited to announce the availability of Philter Managed Deployment on the AWS Marketplace to allow you to quickly get started on deploying a pre-configured instance of Philter into a HIPAA-compliant AWS VPC.

Often, a challenge of using a product like Philter in the cloud is ensuring compliance to the requirements of HIPAA. With the Philter Managed Deployment, our team of AWS certified engineers will construct a HIPAA-compliant cloud architecture to support Philter and your document workload.

To get started visit the Philter Managed Deployment on the AWS Marketplace and click the Continue button. Complete the form and click Send Request to Seller. This does not obligate you to anything. We will receive your request and reach out to begin the conversation.


Apache NiFi for Processing PHI Data

With the recent release of Apache NiFi 1.10.0, it seems like a good time to discuss using Apache NiFi with data containing protected health information (PHI). When PHI is present in data it can present significant concerns and impose many requirements you may not face otherwise due to regulations such as HIPAA.

Apache NiFi probably needs little introduction but in case you are new to it, Apache NiFi is a big-data ETL application that uses directed graphs called data flows to move and transform data. You can think of it as taking data from one place to another while, optionally, doing some transformation to the data. The data goes through the flow in a construct known as a flow file. In this post we'll consider a simple data flow that reads file from a remote SFTP server and uploads the files to S3. We don't need to look at a complex data flow to understand how PHI can impact our setup.

Encryption of Data at Rest and In-motion

Two core things to address when PHI data is present is encryption of the data at rest and encryption of the data in motion. The first step is to identify those places where sensitive data will be at rest and in motion.

For encryption of data at rest, the first location is the remote SFTP server. In this example, let's assume the remote SFTP server is not managed by us, has the appropriate safeguards, and is someone else's responsibility. As the data goes through the NiFi flow, the next place the data is at rest is inside NiFi's provenance repository. (The provenance repository stores the history of all flow files that pass through the data flow.) NiFi then uploads the files to S3. AWS gives us the capability to encrypt S3 bucket contents by default so we will use that through an S3 bucket policy.

For encryption of data in motion, we have the connection between the SFTP server and NiFi and between NiFi and S3. Since we are using an SFTP server, our communication to the SFTP server will be encrypted. Similarly, we will access S3 over HTTPS providing encryption there as well.

If we are using a multi-node NiFi cluster, we may also have the communication between the NiFi nodes in the cluster. If the flows only execute on a single node you may argue that encryption between the nodes is not necessary. However, what happens in the future when the flow's behavior is changed and now PHI data is being transmitted in plain text across a network? For that reason, it's best to set up encryption between NiFi nodes from the start. This is covered in the NiFi System Administrator's Guide.

Encrypting Apache NiFi's Data at Rest

The best way to ensure encryption of data at rest is to use full disk encryption for the NiFi instances. (If you are on AWS and running NiFi on EC2 instances, use an encrypted EBS volume.) This ensures that all data persisted on the system will be encrypted no matter where the data appears. If a NiFi processor decides to have a bad day and dump error data to the log there is a risk of PHI data being included in the log. With full disk encryption we can be sure that even that data is encrypted as well.

Looking at Other Methods

Let's recap the NiFi repositories:

PHI could exist in any of these repositories when PHI data is passing through a NiFi flow. NiFi does have an encrypted provenance repository implementation and NiFi 1.10.0 introduces an experimental encrypted content repository but there are some caveats. (Currently, NiFi does not have an implementation of an encrypted flowfile repository.)

When using these encryption implementations, spillage of PHI onto the file system through a log file or some other means is a risk. There will be a bit of overhead due to the additional CPU instructions to perform the encryption. Comparing usage of the encrypted repositories with using an encrypted EBS volume, we don't have to worry about spilling unencrypted PHI to the disk, and per the AWS EBS encryption documentation, "You can expect the same IOPS performance on encrypted volumes as on unencrypted volumes, with a minimal effect on latency."

There is also the NiFi EncryptContent processor that can encrypt (and decrypt despite the name!) the content of flow files. This processor has use but in very specific cases. Trying to encrypt data at the level of the data flow for compliance reasons is not recommended due to the data possibly existing elsewhere in the NiFi repositories.

Removing PHI from Text in a NiFi Flow

PhilterWhat if you want to remove PHI (and PII) from the content of flow files as they go through a NiFi data flow? Check out our product Philter. It provides the ability to find and remove many types of PHI and PII from natural language, unstructured text from within a NiFi flow. Text containing PHI is sent to Philter and Philter responds with same text but with the PHI and PII removed.

Conclusion

Full disk encryption and encrypting all connections in the NiFi flow and between NiFi nodes provides encryption of data at rest and in motion. It's also recommended that you check with your organization's compliance officer to determine if there are any other requirements imposed by your organization or other relevant regulation prior to deployment. It's best to gather that information up front to avoid rework in the future!

Need more help?

We provide consulting services around AWS and big-data tools like Apache NiFi. Get in touch by sending us a message. We look forward to hearing from you!