Handling Duplicate Entities

When performing entity extraction it is common for an entity extraction request to return duplicate entities. For example, given the input:

George Washington was president. George Washington was married to Martha.

Idyl E3 may return the following entities:

  • George Washington – person – 86% confidence
  • George Washington – person – 89% confidence

The entity “George Washington” is a duplicate entity because the entity text and entity type match at least one other entity in the same entity extraction response. New in Idyl E3 2.4.0 you can choose how to handle duplicate entities. The default behavior (and the same in past versions) is to return all entities regardless of whether they are duplicates or not. A new option is to only return the entity having the highest confidence. For example, given the above entities Idyl E3 would only return the entity having 89% confidence. Entities having a confidence lower than 89% will be ignored.

The “Duplicate Entity Handling Strategy” is controlled via the duplicate.entity.handling.strategy property in Idyl E3’s configuration file. The valid values are:

  • retain – All entities are returned. This is the default behavior.
  • highest – When duplicate entities are present in a single entity extraction request, only the entity having the highest confidence value will be returned.

In summary, the new duplicate.entity.handling.strategy property controls how duplicate entities are handled on a per-entity extraction request basis. This property will be available in Idyl E3 2.4.0 and is documented in Idyl E3 2.4.0’s configuration documentation.