In Idyl E3 2.2.0 we are introducing a feature we call Heuristic Confidence Filtering. Here’s how it works.
As you may (or may not) already know, each entity extraction request can have an associated “confidence threshold value.” Any entities that are extracted who have a confidence lower than this value will not be returned in the entity extraction response. This is useful but it is a bit of a sledgehammer approach and can either result in too much noise or missed entities depending on its value.
When enabled, heuristic confidence filtering tracks the confidence values of extracted entities per the entity model that extracted them. Once a large enough sample of confidence values has been collected, Idyl E3 will filter entities by determining if an entity’s confidence value is significant to the mean of the collected values. This provides a way to filter out noise but still receive important entities.
It is important to note that the confidence threshold value still plays a part even when heuristic confidence filtering is enabled. Any entity whose confidence value is greater than or equal to the confidence threshold for that request will always be returned even when heuristic confidence filtering is enabled.
Because of the mathematical calculations involved and the memory required to store the confidence values the heuristic confidence filtering does require a bit more computation time but not to the point where it should be noticeable.
We are excited to offer this feature and we hope that it helps with “entity noise.” We welcome your feedback on how it performs for you! For more information on this feature you can refer to the Idyl E3 2.2.0 User Documentation or by contacting us. Look for Idyl E3 2.2.0 to be available in February 2017.