Filter Profiles

The types of PHI identified by Philter and how those identified values are manipulated are controlled through files called Filter Profiles. A Filter Profile is a JSON file stored under Philter’s profiles directory or in a Filter Profile Registry. Each Filter Profile has a name and that name is used to tell Philter which filter profile to use during processing. The name can be passed to Philter’s API when submitting text to Philter. This provides flexibility and allows you to process different types of documents with a single instance of Philter.

A Sample Filter Profile

The following is a sample filter profile. In this sample you can see the types of PHI/PII identifiers that are enabled and each’s strategy for manipulating that PHI/PII when found. To use this profile, save it as default.json and save it in Philter’s profiles directory.

{
   "name":"default",
   "identifiers":{
      "ner":{
         "nerFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "age":{
         "ageFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "creditCard":{
         "creditCardFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "date":{
         "dateFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "emailAddress":{
         "emailAddressFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "identifier":{
         "identifierFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "ipAddress":{
         "ipAddressFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "phoneNumber":{
         "phoneNumberFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "ssn":{
         "ssnFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "url":{
         "urlFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "vin":{
         "vinFilterStrategies":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      },
      "zipCode":{
         "zipCodeFilterStrategy":[
            {
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      }
   }
}

A Filter Profile

A Filter Profile is a JSON file. A simple, yet valid, filter profile is shown below. This filter profile instructs Philter to only identify social security numbers (SSNs) and replace them when found with the text ***REDACTED-ssn***. Philter will replace the %t with the type of filter. In this case the replacement will be ssn since that is the type being filtered.

{  
   "name":"justssn",
   "identifiers":{  
      "ssn":{  
         "ssnFilterStrategy":{  
            "strategy":"REDACT",
            "redactionFormat":"***REDACTED-%t***"
         }
      }
   }
}

This filter profile has the name justssn. The name can be any combination of alphanumeric characters but the name must be unique from all other filter profiles. Save this JSON as justssn.json in Philter’s profiles directory. Philter must be restarted for any new or changed filter profiles to be loaded.

The filter profile name default.json has a special meaning. Philter will fall back to using the default profile (when it exists) when no filter profile is specified in the API request. See the API for more information.

Once Philter has been restarted, the justssn filter profile can now be used. To do so, we will pass the filter profile name to Philter when making a filter request, as shown below.

curl -k -X POST "https://localhost:8080/api/filter?c=context&p=justssn" \
  -d @file.txt -H Content-Type "text/plain"

In this command, we have provided the parameter p along with a value that is the name of the filter profile we want to use for this request. If we had multiple filter profiles in Philter we could choose a different filter profile for this request simply by changing the name given to the parameter p. For more details see Philter’s API.

Replacement Strategies

A replacement strategy defines how PHI identified by Philter should be manipulated.

In a filter profile you specify how Philter should identify and replace PHI and PII. In the example in the section above, the filter profile only identified social security numbers. However, we can make a filter profile to identify as many (or as few) of the HIPAA PHI identifiers as we need to. How Philter replaces each type of PHI is specific to each individual type of PHI. For instance, zip codes can be truncated based on the leading digits or zip code population. This method of value manipulation is not relevant to other identifiers.

Each type of PHI listed in a filter profile has a strategy associated with it. This strategy tells Philter how to manipulate that text when identified as PHI. For example, in the social security number example given above, the strategy is to redact (the strategy isREDACT) the text per the given redactionFormat.

Redaction Formats

Strategy Description
REDACT Replaces the identified text with a set pattern given by redactionFormat. Redaction variables are available to customize the redaction text at runtime. See the Redaction Variables section below.
RANDOM_REPLACE Replaces the identified text with a fake value but of the same type. For example, an SSN will be replaced by a random text having the format ###-##-####, such as 123-45-6789. An email address will be replaced with an automatically generated random email address.
STATIC_REPLACE Replaces the identified text with a given static value.
TRUNCATE Currently available only to zip codes, this strategy allows for truncating zip codes to only a select number of digits. Specify truncateDigits to set the desired number of leading digits to leave. For example, if truncateDigits is 2, the zip code 90210 will be truncated to 90***.

Redaction Format Variables

You can put variables in the redaction format that Philter will replace when performing the redaction. The available variables are:

  • The value %t will be replaced with the type of PHI/PII. This is to allow you to know the type of PHI that was identified and redacted.
  • The value %l will be replaced by the given label for the type of PHI/PII. This variable only applies to custom identifier types.

Conditions

A replacement strategy can be applied based on the original PHI/PII value meeting one or more conditions. For example, you can create a condition such that only dates of 11/05/2010 are replaced with the expression token == "11/05/2010". The conditions that can be applied vary based on the type of PHI/PII. For instance, zip codes can have conditions based on their population. The following is an example filter profile for credit cards that contains a condition to only redact credit card numbers that start with the digits 3000:

{
   "name":"default",
   "identifiers":{
      "creditCard":{
         "creditCardFilterStrategies":[
            {
               "condition":"token startswith \"3000\"",
               "strategy":"REDACT",
               "redactionFormat":"{{{REDACTED-%t}}}"
            }
         ]
      }
   }
}

Listing of Conditions per PHI/PII Type

PHI/PII Type Available Conditions Examples
Age
  • token
  • token == “23yrs”
Credit Card
  • token
  • token == “4136033768658155”
City
  • token
  • token == “Bridgeville”
County
  • token
  • token == “Aiken”
Date
  • token
  • token == “02/23/2003”
Email Address
  • token
  • token == “john.fake@hotmail.com”
First Name
  • token
  • token == “John”
Hospital
  • token
  • token == “General Hospital”
Hospital Abbreviation
  • token
  • token == “GH”
Identifier
  • token
  • token == “MT10933”
IP Address
  • token
  • token == “192.168.1.23”
Entity
  • token
  • token == “John Smith”
Phone Number
  • token
  • token == “800-123-4567”
Phone Number Extension
  • token
  • token == “x123”
SSN
  • token
  • token == “123-45-6789”
State Abbreviation
  • token
  • token == “OH”
State
  • token
  • token == “Ohio”
Surname
  • token
  • token == “Smith”
URL
  • token
  • token == “http://www.fakesite.com”
VIN
  • token
  • token == “1VWCT7A37EC163642”
Zip Codes
  • population
  • token
  • population > 2000
  • token == “90210”

Filter Profile Reference

If a type of PHI is omitted in the filter profile then Philter will not look for that type of PHI when processing requests. Only types of PHI listed in a filter profile along with a given strategy will be identified by Philter.

Note that these types of PHI do not necessarily map 1-to-1 with HIPAA PHI types because we have broken some of those PHI types down into smaller units, such as geographic locations down into states, and cities and names down into first names and surnames to allow more granular control. Refer to PHI and PII for examples of the data identified by the types below.

The full list of types and each’s available strategies are listed below. Each type is a single object except for identifiers. This allows you to create custom identifiers per custom regular expression patterns. Each custom identifier will have its own replacement strategy.

Type Available Strategies Examples
age
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact ages:

"age":{  
  "ageFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
credit-card
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact credit cards:

"credit-card":{  
  "creditCardFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
city
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact cities:

"city":{  
  "cityFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
state
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact states:

"state":{  
  "stateFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
county
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact counties:

"county":{  
  "countyFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
date
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact dates:

"date":{  
  "dateFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
email-address
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact email addresses:

"email-address":{  
  "emailAddressFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
first-name
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact first names:

"first-name":{  
  "firstNameFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
hospital
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact hospitals:

"hospital":{  
  "hospitalFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
hospital-abbreviation
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact hospital abbreviations:

"hospital-abbreviation":{  
  "hospitalAbbreviationFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
id
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact identifiers:

"identifiers":[{  
  "idFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}]

To redact identifiers based on a custom pattern:

"identifiers":[{  
  "pattern":"[0-9A-Z]{4}",
  "caseSensitive":true,
  "label":"my-custom-identifier",
  "idFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}]
ip-address
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact IP addresses:

"ip-address":{  
  "ipAddressFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
entity
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact entities:

"entity":{  
  "entityFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
phone-number
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact phone numbers:

"phone-number":{  
  "phoneNumberFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
phone-number-extension
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact phone number extensions:

"phone-number-extension":{  
  "phoneNumberExtensionFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
ssn
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact social security numbers:

"ssn":{  
  "ssnFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
state-abbreviation
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact state abbreviations:

"state-abbreviation":{  
  "stateAbbreviationFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
surname
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact surnames:

"surname":{  
  "surnameFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
url
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact URLs:

"url":{  
  "urlFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
vin
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
To redact vehicle identification numbers:

"vin":{  
  "vinFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}
zip-code
  • REDACT
  • RANDOM_REPLACE
  • STATIC_REPLACE
  • TRUNCATE
  • ZERO_LEADING
To redact zip codes:

"zip-code":{  
  "zipCodeFilterStrategy":{  
    "strategy":"REDACT",
    "redactionFormat":"***REDACTED-%t***"
  }
}

To truncate zip codes:

"zip-code":{  
  "zipCodeFilterStrategy":{  
    "strategy":"TRUNCATE",
    "truncateDigits":2
  }
}

To zero leading zip code digits:

"zip-code":{  
  "zipCodeFilterStrategy":{  
    "strategy":"ZERO_LEADING"
  }
}
Was this article helpful to you? Yes No

How can we help?