Phirestream with AWS Managed Streaming for Apache Kafka (MSK)

Phirestream can be used to redact sensitive information such as personally identifiable information (PII) and protected health information (PHI) from streaming text in Amazon Managed Streaming for Apache Kafka (MSK) clusters. This guide requires you have an Apache Kafka cluster running in Amazon MSK. Refer to the AWS documentation for creating an AWS MSK cluster.

Launch Phirestream in your AWS account.

Phirestream AWS Architecture

Phirestream works as a proxy in front of Apache Kafka and Amazon MSK. Phirestream exposes a REST interface that accepts messages, redacts the sensitive information in the data, and then produces the message to the Kafka brokers.

AWS MSK Cluster Configuration

An example MSK cluster configuration is shown below:

auto.create.topics.enable=true
default.replication.factor=2
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000

AWS MSK Security Group

The following are example security group rules to allow communication with the brokers using TLS. Customize these rules per your VPC and subnet settings. See the AWS MSK documentation for other ports.

Custom TCP TCP 9094 10.0.0.0/16 Brokers and consumers TLS
Custom TCP TCP 2181 10.0.0.0/16 ZooKeeper

Phirestream Settings

Edit the /opt/phirestream/config/application.properties file to set the addresses of the MSK cluster:

kafka.security.protocol=SSL kafka.bootstrap.servers=[msk-broker-addresses]

As an example:

kafka.security.protocol=SSL
kafka.bootstrap.servers=b-3.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094,b-1.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094,b-2.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094

Restart Phirestream for the change to take affect.

sudo systemctl restart phirestream

Phirestream is now ready to receive your text via its Kafka-compliant REST API. The redacted text will be written to the MSK cluster on the appropriate topic. See the Getting Started guide for text redaction examples and refer to the AWS MSK documentation for consuming the redacted text.