Use your own NLP models with Philter

New in Philter 1.5.0 is the ability to use your own custom NLP models with Philter. Available in both Standard and Enterprise editions.

Philter is able to identify named-entities in text through the use of a trained model. The model is able to identify things, like person’s names, in the text that do not follow a well-defined pattern or are easily referenced in a dictionary. Philter’s NLP model is interchangeable and we offer multiple models that you can choose from to better tailor Philter to your use-case and your domain.

However, there are times when using our models may not be sufficient, such as when your use-case does not exactly match our available models or you want to try to get better performance by training a model on text very similar to your input text. In those cases you can train a custom NLP model for use with Philter.

Getting Started

Before diving into the details, here are some examples that illustrate how to make a custom NLP model usable for Philter. Feel free to use these examples as a starting point for your models. Additionally, this capability has been documented in Philter’s User’s Guide.

The first example in that repository is an implementation of a named-entity recognizer using Apache OpenNLP. The entity recognizer is exposed via two endpoints using a Spring Boot REST controller.

  • The first endpoint /process simply passes the received text to the entity recognizer. The entity recognizer receives the text, extracts the entities, and returns them in a list of PhilterSpan objects.
  • The second endpoint /status simply returns HTTP 200 and the text healthy. In cases where your model may take a bit to load, it would be better to actually check the status of the model instead of just returning that it is healthy. But for this example, the load model loading is quick and done before the HTTP service is even available.

That’s the only required endpoints. Philter will use those endpoints to interact with your model.

@RequestMapping(value = "/process", method = RequestMethod.POST, consumes = MediaType.TEXT_PLAIN_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
public @ResponseBody List<PhilterSpan> process(@RequestBody String text) {
return openNlpNer.extract(text);
}

@RequestMapping(value = "/status", method = RequestMethod.GET)
public ResponseEntity<String> status() {
return new ResponseEntity<String>("healthy", HttpStatus.OK);
}
Click here to see the full source file.

The PhilterSpan object contains the details of a single extracted entity. An example response from the /process endpoint for a single entity is shown below.

[ 
  {
    text: "George Washington",
    tag: "person",
    score: "0.97",
    start: "0",
    end: "17"
  }
]

Custom NLP Models

Training Your Own Model

Philter is indifferent of the technologies and methods you choose to train your custom model. You can use any framework you like, such as Apache OpenNLP, spaCy, Stanford CoreNLP, or your own custom framework. Follow the framework’s documentation for training a model.

Using a Model You Already Have

If you already have a trained NLP model you can skip the training part and proceed on to making the model accessible to Philter.

Using Your Own Model with Philter

Once your model has been trained and you are satisfied with its performance, to use the model with Philter you must expose the model by implementing a simple HTTP service interface around it. This service facilitates communication between Philter and your model. The service is illustrated below.

Once your model is available behind the HTTP interface described above, you are ready to use the model with Philter. On the Philter virtual machine, simply export the PHILTER_NER_ENDPOINT environment variable to be the location of the running HTTP service. It is recommended you set this environment variable in /etc/environment. If your HTTP service is running on the same host as Philter on port 8888, the environment variable would be set as:

export PHILTER_NER_ENDPOINT=http://localhost:8888/

Now restart the Philter service and stop and disable the philter-ner service.

sudo systemctl restart philter.service
sudo systemctl stop philter-ner.service
sudo systemctl disable philter-ner.service

When a filter profile containing an NER filter is applied, requests will be made to your HTTP service invoking your model inference returning the identified named-entities.

Recommendations and Best Practices

You have complete freedom to train your custom NLP model using whatever tools and processes you choose. However, from our experience that are a few things that can help you be successful.
The first recommendation is to contain your service in a Docker container. Doing so gives you a self-contained image that can be deployed and run virtually anywhere. It simplifies dependency management and protects you from dependency version changes.

The second recommendation is to make your HTTP service as lightweight as possible. Avoid any unnecessary code or features that could negatively impact the speed of your model inference.

Lastly, thoroughly evaluate your model prior to deploying the model to Philter to have a better expectation of performance.

Conclusion

Using a custom NLP model with Philter is a fairly straightforward process. Train your model, make it accessible by HTTP, and then deploy the HTTP service and the model such that the service is accessible from Philter.

Do you have to train a custom model to use Philter? Absolutely not. Philter’s “out-of-the-box” capabilities are sufficient for most use-cases. Will using your own model give you better performance? Possibly. It depends on how well the model is trained and all of the parameters used. Training your own model can be a difficult and time consuming activity so it’s best to have some familiarity with the process before starting.

We are excited to offer this feature and look forward to getting your feedback!

 Philter Version 
Launch Philter on AWS1.6.0
Launch Philter on Azure1.6.1
Launch Philter on Google Cloud1.6.0