Philter and Elasticsearch

PhilterPhilter, our application for finding and removing PII and PHI from natural language text, has the ability to optionally store the identified text in an external data store. With this feature, you had access to a complete log of Philter’s actions as well as the ability to reconstruct the original text in the future if you ever needed to.

In Philter 1.0,  we chose MongoDB as the external data store. With just a few configuration properties, Philter would connect to MongoDB and persist all identified “spans” (the identified text, its location in the document, and some other attributes) to a MongoDB database. This worked well but we realized that looking forward it might not have been the best choice.

In Philter 1.1 we are replacing MongoDB with Elasticsearch. The functionality and the Philter APIs will remain the same. The only difference is that now instead of the spans being stored in a MongoDB database they will now be stored in an Elasticsearch index. So, what, exactly are the benefits? Great question.

The first benefit comes with Elasticsearch and Kibana’s ability to quickly and easily make dashboards to view the indexed data. With the spans in Elasticsearch, you can make a dashboard to summarize the spans by type, text, etc., to show insights into the PII and PHI that Philter is finding and manipulating in your text.

It also became quickly apparent that a primary use-case for users and the store would be to query the spans it contains. For example, a query to find all documents containing “John Doe” or all documents containing a certain date or phone number. A search engine is better prepared to handle those queries.

Another consideration is licensing. Elasticsearch is available under the Apache Software License or a compatible license while MongoDB is available under a Server Side Public License.

In summary, Philter 1.1 will offer support for using Elasticsearch as the store for identified PII and PHI. Remember, using the store is an optional feature of Philter. If you do not require any history of the text that Philter identifies then it is not needed. (By default, Philter’s store feature is disabled and has to be explicitly enabled.) Support for using MongoDB as a store will not be available in Philter 1.1.

We are really excited about this change and excited about the possibilities that comes with it!