Predefined Filters

Each filter is capable of identifying and redacting a specific type of sensitive information. For example, there is a filter for phone numbers, a filter for US social security numbers, and a filter for person’s names. You can enable any combination of these filters based on the types of sensitive information you need to redact. This section of the documentation describes the available filters and how to enable and configure each.

Predefined Filters

Many predefined types of sensitive information can be identified and redacted. Each type, or filter, can be enabled or disabled separately from the other types in a filter profile.

Person’s Names

Several methods are utilized to identify person’s names.

Type Description
Person’s Names Identifies full names using natural language processing analysis methods.
First Names Identifies common first names.
Surnames Identifies common surnames.
Physician Names Identifies physician names.

Other Filters

The following non-person’s name filters are also available.

Type Description
Ages Identifies ages such as 33.5 years old
Bank Routing Numbers Identifies bank (ABA transit) routing numbers.
Bitcoin Addresses Identifies Bitcoin addresses such as 127NVqnjf8gB9BFAW2dnQeM6wqmy1gbGtv
Cities Identifies common cities
Counties Identifies common counties
Countries Identifies common countries
Credit Card Numbers Identifies VISA, American Express, MasterCard, and Discover credit card numbers.
Currency Identifies USD currency.
Dates Identifies dates in many formats such as May 22, 1999
Driver’s License Numbers Identifies driver’s license numbers for all 50 US states
Email Addresses Identifies email addresses
Hospitals and Hospital Abbreviations Identifies common hospital names and their abbreviations
IBAN Codes Identifies international bank account numbers
IP Addresses Identifies IPv4 and IPv6 addresses
MAC Addresses Identifies network MAC addresses
Passport Numbers Identifies US passport numbers
Phone Numbers Identifies phone numbers and phone number extensions
Sections Identifies sections in text denoted by start and stop markers
SSNs and TINs Identifies US SSNs and TINs
States and State Abbreviations Identifies US state names and abbreviations
Tracking Numbers Identifies UPS, FedEx, and USPS tracking numbers
URLs Identifies URLs
VINs Identifies vehicle identification numbers
Zip Codes Identifies US zip codes

Custom Filter Types of Sensitive Information

In addition to the predefined types of sensitive information listed in the table above, you can also define your own types of sensitive information. Through custom identifiers and dictionaries, Philter can identify many other types of information that may be sensitive in your use-case. For example, if you have patient identifiers that follow a pattern of AA-00000 you can define a custom identifier for this sensitive information.

Philter can be configured to look identify sensitive information based on custom dictionaries. When a term in the dictionary is found in the text, Philter will treat the term as sensitive information and apply the given replacement strategy.

Custom dictionaries support fuzziness to accommodate for misspellings. The replacement strategy for a custom dictionary has a sensitivityLevel that controls the amount of allowed fuzziness.

Type Description
Custom Dictionaries Identifies sensitive information based on dictionary values.
Custom Identifiers Identifies custom alphanumeric identifiers that may be used for medical record numbers, patient identifiers, account number, or other specific identifier.