Distribution of Entity Confidence Values in a Sample Data Set

In a previous post titled Tuning the Confidence Threshold Parameter we described how the confidence threshold parameter can be used to control the strictness of the entity extraction. We would like to now give a little more insight into the parameter.

We recently extracted entities from more than 500,000 documents with Idyl. These documents were mostly news and news-like articles. (I say “News-like” because some did not follow the traditional format of a news article.) During the extraction we tracked the confidence value of each entity.  When the processing was complete we randomly selected 10,000 of the entities and produced the histogram of the confidence values shown below. (The Y-axis is the number of entities having the confidence value on the X-axis.)

 As the histogram shows, nearly all of the entities extracted had a confidence value greater than 50. In our spot checks, all of the entities with a confidence value less than 50 was not an actual entity and could be discarded. (They included things like abbreviations.) Between 60 and 80 the entities were more reliable, with about 75% of the entities being actual entities. Nearly all entities that were extracted with a confidence level greater than 80 were actual entities. We just spot checked the extracted entities in this investigation but in a follow-up post we will provide numbers and percentages.

The takeaway from all this is that choosing a confidence threshold of 80 is probably a safe value. You can always, of course, tweak the value later if you find that you need to.

Thanks for reading!

OpenSSL “Heartbleed” Vulnerability

Our systems were upgraded to the patched versions of OpenSSL earlier in the week and we re-keyed our SSL certificate. We recommend that all users change their passwords and generate new API keys.

Tuning the Confidence Threshold Parameter

When you start using Idyl you’ll see the confidence threshold parameter when extracting entities. In this post we want to shed some light on this parameter

When Idyl looks for entities it is not a binary “yes or no” operation. The Idyl engine will have more confidence that some words or phrases constitute entities and less confidence in others. With the confidence threshold parameter you can tell Idyl to not extract any entities if Idyl’s confidence level in the entity is less than the value you provide.

Valid values for the confidence threshold parameter range from 0 to 100. Keep in mind that entities rarely ever (if ever) achieve a 100% confidence level. Most will fall in the 60-90% range but it really depends on your text and can vary (described below). If you need Idyl to return entities with a lower confidence level you can just change the confidence threshold parameter in your API request. If you don’t specify a value for the confidence threshold parameter it will default to 0, meaning that all identified entities will be returned.

What confidence threshold value to use depends upon your data. If you start with a value of 60 and notice that some entities are not being detected try lowering the value. The Idyl Demo uses a value of 0 so you can use it to see Idyl’s confidence level for a sample of your input.

We hope this provides some insight into the purpose and function of the confidence threshold value. If you have any questions please comment or shoot an email to support@mtnfog.com.

Updates to Idyl SDKs

Recently we announced that a subset of Idyl’s APIs are available on Mashape. Currently, only Idy’s entity extraction API and language detection API are exposed through Mashape. We have updated the Idyl SDKs to be able to use the Mashape API endpoints. Example code snippets are shown in Mashape’s readme for Idyl.

Happy coding!

Idyl Engine Update

In the past week we deployed an update to Idyl’s querying engine. This update greatly improves the performance of executing SPARQL queries. Queries on small entity contexts probably won’t see much improvement but users with large numbers of entities will notice improvements in query response time.

New Support Help Desk!

As part of our effort to give you a better experience we have just migrated our customer support processes to a new help desk. We believe the capabilities and features provided by the new help desk will help us help you better. (We all win!)

What does this mean to you? You can now create support tickets at https://mtnfog.freshdesk.com. If you email support@mtnfog.com with a support request we will create a helpdesk ticket on your behalf.

Over the next weeks we will be populating the helpdesk’s FAQs and solutions with the goal of documenting many common problems, questions, and solutions.

If you have any comments or questions please always feel free to drop us a line at support@mtnfog.com.

Using the Idyl SDK with Maven

The Idyl API is described on the api page. The Idyl interface is just a set of REST webservices. To reduce the time necessary to develop for Idyl we have created wrappers for in both Java and .NET. Here’s a quick look at how to use the Java SDK with Maven.

First, add our repository to your pom.xml:

<repository>
<id>mtnfog-repo</id>
<name>Mountain Fog Repository</name>
<url>http://content.mtnfog.com/sdk/idyl-saas/release/</url>
</repository>

Next, add the Idyl SaaS SDK dependency:

<dependency>
<groupId>com.mtnfog.sdk</groupId>
<artifactId>idyl-saas-java-sdk</artifactId>
<version>1.0.0</version>
</dependency>

Now with that done you can move on to the fun stuff. Here’s a snippet of using the SDK for extracting entities:

// Set your Idyl API key.
final String apiKey = "HFPL37MZAP03JFXS";

// Set the text to be sent to Idyl.
final String sentence = "John Smith is a person.";

IdylClient idylClient = new IdylClient(apiKey);

ExtractEntitiesRequest request = new ExtractEntitiesRequest(sentence);

// If you want to correlate entities set the context and optionally the doc id:
// request.setContext("contextA");
// request.setDocId("document1");

ExtractEntitiesResponse response = idylClient.extractEntities(request);

// Show the http status code.
System.out.println("Http status code: " + response.getHttpResponseCode());

// Check the extracted entities.
System.out.println("Extracted entities: " + response.getEntities().size());

// Loop over the entities.
for(Entity entity : response.getEntities()) {

System.out.println("Entity: " + entity.getEntity() + ", Type: " + entity.getType());

}

All you need to do is replace the example API key with your Idyl API key and set your sentence value. And that is all. The request will be sent to Idyl and the extracted entities will be printed.

If you want to store the extracted entities to query over them later uncomment the two lines that set the context and document ID and set your values. Think of the context as the name for a collection of documents and the document ID as the name for a single document. For example, if you were extracting entities from books the context could be the type of book (fiction, nonfiction) or the author and the document ID could be each book’s title. Now with stored entities you can use the SDK to query those entities. The code follows the same pattern as above:

final String apiKey = "HFPL37MZAP03JFXS";
final String query = "SELECT ?entity WHERE { <https://mtnfog.com/idyl/testcontext/doc1> <https://mtnfog.com/idyl/contains> ?entity . }";

IdylClient idylClient = new IdylClient(apiKey);

QueryRequest request = new QueryRequest(query);

QueryResponse response = idylClient.query(request);

// Show the http status code.
System.out.println("Http status code: " + response.getHttpResponseCode());

// The result of the query with be a RDF/XML string.
System.out.println(response.getRdfXmlOutput());

This code executes a  SPARQL query on your entities that simply returns all entity names under the testcontext and document ID doc1. The query will be sent to Idyl and the returned entities will be printed.

The use of the .NET SDK is very similar but we will describe it soon!

Welcome!

Welcome to our new blog! We’re excited to blog and share with you information about services. Right now we are busy getting Idyl ready for prime time. Idyl will soon be open for beta users. If you would like to be notified when it is ready please let us know.

We’re also on Twitter! Send us a tweet @mtnfog.