Almost half of WV geotagged tweets are sent from Morgantown and Huntington

Mountain Fog is a West Virginia company, and as such we take an interest in the social media use of West Virginians. From June 9, 2015, to June 19, 2015, we sampled tweets and divided them into two categories – tweets that were sent from West Virginia and tweets that were sent from the other 49 states. Our goal was to survey the tweets between the two categories for similarities and differences.

We captured approximately 209,000 tweets, of those about 800, or about 0.40%, originated in West Virginia. (It is interesting to note that WV’s population represents 0.58% of the United States’ population according to the 2014 census.)

Tweets by City

Almost half (45.7%) of all WV geotagged tweets were sent from Morgantown and Huntington. Charleston, WV’s largest city by population, came in fourth behind Parkersburg. Perhaps the younger, student populations of Morgantown and Huntington helped contribute to the rank of each city since the cities are not ordered by population, but that’s just a hypothesis. Other areas of WV represented to a lesser degree are Wheeling and Weirton in the northern panhandle and Martinsburg in the eastern panhandle. Fewer tweets were sent from the Fairmont/Clarksburg and Beckley areas. (The West Virginia tweets that were not geotagged with a city were not considered.)

Tweets by West Virginia City


Heat map of tweets by West Virginia city

Sentiment of Tweets

Next, we looked at the sentiment of WV tweets compared to non-WV tweets. We used Idyl’s sentiment analyzer. (In case you are not familiar, Idyl is our product for performing text analysis.) We found WV tweets to be more positive than tweets from the rest of the country. 37% of WV tweets were found to have a positive sentiment compared to 31% of the tweets from the rest of the country. WV tweets were also less negative by 1%. The sentiment analysis algorithm determines whether the sentiment of a tweet is positive, negative, or neutral based on the text of the tweet. For example, the tweet “This place is great” has a positive sentiment while “This place is terrible” has a negative sentiment.

Count of WV Tweets
Count of Non-WV Tweets
Negative 172 (20.8%) 46,438 (21.07%)
Neutral 347 (41.96%) 104,308 (47.34%)
Positive 308 (37.24%) 69,604 (31.59%)

Tweet Content

As for the content of the tweets they were all over the board. There were tweets about the NBA finals, school being out, and random conversations. Perhaps a larger sample size would expose more specific topics.

Thanks for reading and stay tuned for further updates.

Posted by / June 21, 2015

Idyl Extraction Engine Java SDK on Maven Central

The Idyl Extraction Engine Java SDK is now available in the Maven Central repository:


The SDK is licensed under the Apache Software License, version 2.0 and the source code is available on Bitbucket.

The SDK provides an IdylAmiClient that has functions for submitting text for entity extraction and interacting with the optionally integrated services. An example invocation of entity extraction using the SDK is:

Posted by / January 12, 2015

Idyl Extraction Engine.NET SDK Available through NuGet

The Idyl Extraction Engine.NET SDK is now available through NuGet. Similar to the Java SDK, the .NET SDK for the Idyl AMI provides the ability to submit text to the Idyl AMI entity extraction engine and parse the returned entities. The Idyl AMI .NET SDK is licensed under the Apache Software License, version 2.0.

Idyl AMI SDK for .NET on NuGet

Use the SDK for easy integration of Idyl’s entity extraction capabilities into your .NET applications. The source code of the SDK is available on Bitbucket. We welcome any feedback on the SDK.

Posted by / January 10, 2015

Announcing the Idyl Extraction Engine on the AWS Marketplace

We are very excited to announce that Idyl Extraction Engine is now available through the AWS Marketplace. Now you can have person entity extraction capabilities inside your own cloud with no request limits, no contracts, zero initial investment, and the first 7 days are free.

The Idyl AMI for Person Entities is a turn-key person entity extraction solution. Through a simple webservice (REST) interface, Idyl AMI’s extraction capabilities can be integrated into your text processing systems and solutions.

Idyl AMI includes support for integrating with other AWS services:

  • DynamoDB integration allows for storing your extracted entities.
  • Automatically put your extracted entities onto an SQS queue for later processing.
  • Trigger SNS notifications when entities are extracted.
  • Submit extraction metrics to CloudWatch to monitor extraction times.

These integrations are all optional and can be used in combination with each other.

Launch the Idyl AMI for Person Entities in your cloud today from the AWS Marketplace.

Posted by / January 8, 2015

Entity Extraction for Tweets

We have added entity extraction capabilities for tweets to the Idyl Cloud API. The tweet extraction endpoint can be accessed through Mashape and through Idyl Cloud accounts. Support extracting entities from tweets will be added to the Idyl Cloud SDKs in the coming week.

Posted by / January 1, 2015

Idyl Cloud API on Mashape

The Idyl Cloud APIs for language detection and person entity recognition are now available through Mashape. Look for more Idyl Cloud APIs to be added to Mashape soon.

Idyl Cloud is a webservice for performing natural language processing. Learn more about Idyl Cloud at

Posted by / December 29, 2014

Distribution of Entity Confidence Values in a Sample Data Set

In a previous post titled Tuning the Confidence Threshold Parameter we described how the confidence threshold parameter can be used to control the strictness of the entity extraction. We would like to now give a little more insight into the parameter.

We recently extracted entities from more than 500,000 documents with Idyl. These documents were mostly news and news-like articles. (I say “News-like” because some did not follow the traditional format of a news article.) During the extraction we tracked the confidence value of each entity.  When the processing was complete we randomly selected 10,000 of the entities and produced the histogram of the confidence values shown below. (The Y-axis is the number of entities having the confidence value on the X-axis.)

 As the histogram shows, nearly all of the entities extracted had a confidence value greater than 50. In our spot checks, all of the entities with a confidence value less than 50 was not an actual entity and could be discarded. (They included things like abbreviations.) Between 60 and 80 the entities were more reliable, with about 75% of the entities being actual entities. Nearly all entities that were extracted with a confidence level greater than 80 were actual entities. We just spot checked the extracted entities in this investigation but in a follow-up post we will provide numbers and percentages.

The takeaway from all this is that choosing a confidence threshold of 80 is probably a safe value. You can always, of course, tweak the value later if you find that you need to.

Thanks for reading!

Posted by / April 28, 2014

OpenSSL “Heartbleed” Vulnerability

Our systems were upgraded to the patched versions of OpenSSL earlier in the week and we re-keyed our SSL certificate. We recommend that all users change their passwords and generate new API keys.

Posted by / April 10, 2014

Tuning the Confidence Threshold Parameter

When you start using Idyl you’ll see the confidence threshold parameter when extracting entities. In this post we want to shed some light on this parameter

When Idyl looks for entities it is not a binary “yes or no” operation. The Idyl engine will have more confidence that some words or phrases constitute entities and less confidence in others. With the confidence threshold parameter you can tell Idyl to not extract any entities if Idyl’s confidence level in the entity is less than the value you provide.

Valid values for the confidence threshold parameter range from 0 to 100. Keep in mind that entities rarely ever (if ever) achieve a 100% confidence level. Most will fall in the 60-90% range but it really depends on your text and can vary (described below). If you need Idyl to return entities with a lower confidence level you can just change the confidence threshold parameter in your API request. If you don’t specify a value for the confidence threshold parameter it will default to 0, meaning that all identified entities will be returned.

What confidence threshold value to use depends upon your data. If you start with a value of 60 and notice that some entities are not being detected try lowering the value. The Idyl Demo uses a value of 0 so you can use it to see Idyl’s confidence level for a sample of your input.

We hope this provides some insight into the purpose and function of the confidence threshold value. If you have any questions please comment or shoot an email to

Posted by / April 1, 2014

Updates to Idyl SDKs

Recently we announced that a subset of Idyl’s APIs are available on Mashape. Currently, only Idy’s entity extraction API and language detection API are exposed through Mashape. We have updated the Idyl SDKs to be able to use the Mashape API endpoints. Example code snippets are shown in Mashape’s readme for Idyl.

Happy coding!

Posted by / March 20, 2014

Idyl Engine Update

In the past week we deployed an update to Idyl’s querying engine. This update greatly improves the performance of executing SPARQL queries. Queries on small entity contexts probably won’t see much improvement but users with large numbers of entities will notice improvements in query response time.

Posted by / March 13, 2014

New Support Help Desk!

As part of our effort to give you a better experience we have just migrated our customer support processes to a new help desk. We believe the capabilities and features provided by the new help desk will help us help you better. (We all win!)

What does this mean to you? You can now create support tickets at If you email with a support request we will create a helpdesk ticket on your behalf.

Over the next weeks we will be populating the helpdesk’s FAQs and solutions with the goal of documenting many common problems, questions, and solutions.

If you have any comments or questions please always feel free to drop us a line at

Posted by / March 3, 2014

Using the Idyl SDK with Maven

The Idyl API is described on the api page. The Idyl interface is just a set of REST webservices. To reduce the time necessary to develop for Idyl we have created wrappers for in both Java and .NET. Here’s a quick look at how to use the Java SDK with Maven.

First, add our repository to your pom.xml:

<name>Mountain Fog Repository</name>

Next, add the Idyl SaaS SDK dependency:


Now with that done you can move on to the fun stuff. Here’s a snippet of using the SDK for extracting entities:

// Set your Idyl API key.
final String apiKey = "HFPL37MZAP03JFXS";

// Set the text to be sent to Idyl.
final String sentence = "John Smith is a person.";

IdylClient idylClient = new IdylClient(apiKey);

ExtractEntitiesRequest request = new ExtractEntitiesRequest(sentence);

// If you want to correlate entities set the context and optionally the doc id:
// request.setContext("contextA");
// request.setDocId("document1");

ExtractEntitiesResponse response = idylClient.extractEntities(request);

// Show the http status code.
System.out.println("Http status code: " + response.getHttpResponseCode());

// Check the extracted entities.
System.out.println("Extracted entities: " + response.getEntities().size());

// Loop over the entities.
for(Entity entity : response.getEntities()) {

System.out.println("Entity: " + entity.getEntity() + ", Type: " + entity.getType());


All you need to do is replace the example API key with your Idyl API key and set your sentence value. And that is all. The request will be sent to Idyl and the extracted entities will be printed.

If you want to store the extracted entities to query over them later uncomment the two lines that set the context and document ID and set your values. Think of the context as the name for a collection of documents and the document ID as the name for a single document. For example, if you were extracting entities from books the context could be the type of book (fiction, nonfiction) or the author and the document ID could be each book’s title. Now with stored entities you can use the SDK to query those entities. The code follows the same pattern as above:

final String apiKey = "HFPL37MZAP03JFXS";
final String query = "SELECT ?entity WHERE { <> <> ?entity . }";

IdylClient idylClient = new IdylClient(apiKey);

QueryRequest request = new QueryRequest(query);

QueryResponse response = idylClient.query(request);

// Show the http status code.
System.out.println("Http status code: " + response.getHttpResponseCode());

// The result of the query with be a RDF/XML string.

This code executes a  SPARQL query on your entities that simply returns all entity names under the testcontext and document ID doc1. The query will be sent to Idyl and the returned entities will be printed.

The use of the .NET SDK is very similar but we will describe it soon!

Posted by / February 10, 2014


Welcome to our new blog! We’re excited to blog and share with you information about services. Right now we are busy getting Idyl ready for prime time. Idyl will soon be open for beta users. If you would like to be notified when it is ready please let us know.

We’re also on Twitter! Send us a tweet @mtnfog.

Posted by / January 5, 2014