Getting Started

The safest way to manage sensitive information in your systems is to apply safeguards before the the sensitive information can enter your systems. Phirestream works in front of Apache Kafka to redact sensitive information such as Protected Health Information (PHI) and Personally Identifiable Information (PII) from your streams before the sensitive information is published to an Apache Kafka topic.

Launching Phirestream

Step through your cloud provider’s steps for launching Phirestream in your cloud. Once Phirestream has been launched and its virtual machine is running, you can continue with this guide below to configure Phirestream.

Phirestream can be used with self-managed Apache Kafka clusters and managed hosting services, such as Amazon MSK, Confluent Cloud, and Instaclustr.

Configuring Phirestream

With Phirestream now running we can configure it. Here we configure how Phirestream listens for incoming data for redaction and the details of the downstream Apache Kafka brokers.

Open the Phirestream configuration file at /opt/phirestream/config/ Set the value of the kafka.bootstrap.servers property to the location of your Apache Kafka broker(s). Use the command below to restart Phirestream to make the change to take affect. (For a full list of the available Phirestream settings see Settings.)

sudo systemctl restart phirestream

Once Phirestream restarts we are now ready to publish and redact text. Phirestream’s API endpoint is accessible at https://phirestream:8080/, where phirestream is the IP or DNS name of the Phirestream virtual machine.

Using Phirestream to Redact Text

The following command will publish a single message to Phirestream. In this request, the text George Washington was president is being published to the Apache Kafka topic mytopic.

Phirestream implements Apache Kafka’s REST API interface. This means that Phirestream can be a drop-in solution for redacting text in your streaming data pipelines.

curl -k -X POST \
  https://localhost:8080/topics/default \
  -H 'Content-Type: application/vnd.kafka.json.v2+json' \
  -d '{
    "records": [
            "key": "key-1",
            "value": "George Washington was president."

Consuming the Redacted Text

Now, we will use Apache Kafka to consume from the mytopic topic to get the redacted message: \
   --topic default \
   --bootstrap-server localhost:9092 \

The output of the command is a single message with the following content:

{{{REDACTED-entity}}} was president.

You are now ready to redact more streaming text with Phirestream!


In this example we can see that Phirestream received the request, redacted the person’s name as sensitive information, and published the modified data to Apache Kafka.

The types of sensitive information that are identified by Phirestream are defined in files called filter profiles. A filter profile specifies the types of sensitive information and how to redact those types. Phirestream selects which filter profile to apply based on the name of the Apache Kafka topic. In the example above, the topic name was default so the filter profile named default was applied.

You are now ready to begin using Phirestream to manage sensitive information in your streaming text!