Idyl 1.3.1 is now available on the AWS Marketplace. This version brings minor changes and comes with a free 30 day trial.
Mountain Fog is a West Virginia company, and as such we take an interest in the social media use of West Virginians. From June 9, 2015, to June 19, 2015, we sampled tweets and divided them into two categories – tweets that were sent from West Virginia and tweets that were sent from the other 49 states. Our goal was to survey the tweets between the two categories for similarities and differences.
We captured approximately 209,000 tweets, of those about 800, or about 0.40%, originated in West Virginia. (It is interesting to note that WV’s population represents 0.58% of the United States’ population according to the 2014 census.)
Tweets by City
Almost half (45.7%) of all WV geotagged tweets were sent from Morgantown and Huntington. Charleston, WV’s largest city by population, came in fourth behind Parkersburg. Perhaps the younger, student populations of Morgantown and Huntington helped contribute to the rank of each city since the cities are not ordered by population, but that’s just a hypothesis. Other areas of WV represented to a lesser degree are Wheeling and Weirton in the northern panhandle and Martinsburg in the eastern panhandle. Fewer tweets were sent from the Fairmont/Clarksburg and Beckley areas. (The West Virginia tweets that were not geotagged with a city were not considered.)
|Tweets by West Virginia City|
|Heat map of tweets by West Virginia city|
Sentiment of Tweets
Next, we looked at the sentiment of WV tweets compared to non-WV tweets. We used Idyl’s sentiment analyzer. (In case you are not familiar, Idyl is our product for performing text analysis.) We found WV tweets to be more positive than tweets from the rest of the country. 37% of WV tweets were found to have a positive sentiment compared to 31% of the tweets from the rest of the country. WV tweets were also less negative by 1%. The sentiment analysis algorithm determines whether the sentiment of a tweet is positive, negative, or neutral based on the text of the tweet. For example, the tweet “This place is great” has a positive sentiment while “This place is terrible” has a negative sentiment.
Count of WV Tweets
Count of Non-WV Tweets
|Negative||172 (20.8%)||46,438 (21.07%)|
|Neutral||347 (41.96%)||104,308 (47.34%)|
|Positive||308 (37.24%)||69,604 (31.59%)|
As for the content of the tweets they were all over the board. There were tweets about the NBA finals, school being out, and random conversations. Perhaps a larger sample size would expose more specific topics.
Thanks for reading and stay tuned for further updates.
The Idyl Extraction Engine (Places Entities) is now available on the AWS Marketplace! This is a turnkey solution for performing extraction of place entities from natural language English text. Instead of having to make requests out of your network you can now extract places right in your own cloud network. A free, no risk 7 day trial is available.
A short post today. The Idyl Cloud API is listed on the ProgrammableWeb. Check it out at http://www.programmableweb.com/api/idyl-cloud.
The SDK provides an IdylAmiClient that has functions for submitting text for entity extraction and interacting with the optionally integrated services. An example invocation of entity extraction using the SDK is:
The Idyl Extraction Engine.NET SDK is now available through NuGet. Similar to the Java SDK, the .NET SDK for the Idyl AMI provides the ability to submit text to the Idyl AMI entity extraction engine and parse the returned entities. The Idyl AMI .NET SDK is licensed under the Apache Software License, version 2.0.
Use the SDK for easy integration of Idyl’s entity extraction capabilities into your .NET applications. The source code of the SDK is available on Bitbucket. We welcome any feedback on the SDK.
The Idyl AMI for Person Entities is a turn-key person entity extraction solution. Through a simple webservice (REST) interface, Idyl AMI’s extraction capabilities can be integrated into your text processing systems and solutions.
Idyl AMI includes support for integrating with other AWS services:
- DynamoDB integration allows for storing your extracted entities.
- Automatically put your extracted entities onto an SQS queue for later processing.
- Trigger SNS notifications when entities are extracted.
- Submit extraction metrics to CloudWatch to monitor extraction times.
These integrations are all optional and can be used in combination with each other.
We have added entity extraction capabilities for tweets to the Idyl Cloud API. The tweet extraction endpoint can be accessed through Mashape and through Idyl Cloud accounts. Support extracting entities from tweets will be added to the Idyl Cloud SDKs in the coming week.
The Idyl Cloud APIs for language detection and person entity recognition are now available through Mashape. Look for more Idyl Cloud APIs to be added to Mashape soon.
Idyl Cloud is a webservice for performing natural language processing. Learn more about Idyl Cloud at http://www.mtnfog.com/idyl/.
In the near future we will be making Idyl AMIs (Amazon Machine Images) available through the AWS Marketplace. These AMIs will contain a turnkey named-entity recognition solution. Stay tuned!
In a previous post titled Tuning the Confidence Threshold Parameter we described how the confidence threshold parameter can be used to control the strictness of the entity extraction. We would like to now give a little more insight into the parameter.
We recently extracted entities from more than 500,000 documents with Idyl. These documents were mostly news and news-like articles. (I say “News-like” because some did not follow the traditional format of a news article.) During the extraction we tracked the confidence value of each entity. When the processing was complete we randomly selected 10,000 of the entities and produced the histogram of the confidence values shown below. (The Y-axis is the number of entities having the confidence value on the X-axis.)
As the histogram shows, nearly all of the entities extracted had a confidence value greater than 50. In our spot checks, all of the entities with a confidence value less than 50 was not an actual entity and could be discarded. (They included things like abbreviations.) Between 60 and 80 the entities were more reliable, with about 75% of the entities being actual entities. Nearly all entities that were extracted with a confidence level greater than 80 were actual entities. We just spot checked the extracted entities in this investigation but in a follow-up post we will provide numbers and percentages.
The takeaway from all this is that choosing a confidence threshold of 80 is probably a safe value. You can always, of course, tweak the value later if you find that you need to.
Thanks for reading!
Our systems were upgraded to the patched versions of OpenSSL earlier in the week and we re-keyed our SSL certificate. We recommend that all users change their passwords and generate new API keys.
When you start using Idyl you’ll see the confidence threshold parameter when extracting entities. In this post we want to shed some light on this parameter
When Idyl looks for entities it is not a binary “yes or no” operation. The Idyl engine will have more confidence that some words or phrases constitute entities and less confidence in others. With the confidence threshold parameter you can tell Idyl to not extract any entities if Idyl’s confidence level in the entity is less than the value you provide.
Valid values for the confidence threshold parameter range from 0 to 100. Keep in mind that entities rarely ever (if ever) achieve a 100% confidence level. Most will fall in the 60-90% range but it really depends on your text and can vary (described below). If you need Idyl to return entities with a lower confidence level you can just change the confidence threshold parameter in your API request. If you don’t specify a value for the confidence threshold parameter it will default to 0, meaning that all identified entities will be returned.
What confidence threshold value to use depends upon your data. If you start with a value of 60 and notice that some entities are not being detected try lowering the value. The Idyl Demo uses a value of 0 so you can use it to see Idyl’s confidence level for a sample of your input.
We hope this provides some insight into the purpose and function of the confidence threshold value. If you have any questions please comment or shoot an email to email@example.com.
Recently we announced that a subset of Idyl’s APIs are available on Mashape. Currently, only Idy’s entity extraction API and language detection API are exposed through Mashape. We have updated the Idyl SDKs to be able to use the Mashape API endpoints. Example code snippets are shown in Mashape’s readme for Idyl.
In the past week we deployed an update to Idyl’s querying engine. This update greatly improves the performance of executing SPARQL queries. Queries on small entity contexts probably won’t see much improvement but users with large numbers of entities will notice improvements in query response time.
As part of our effort to give you a better experience we have just migrated our customer support processes to a new help desk. We believe the capabilities and features provided by the new help desk will help us help you better. (We all win!)
What does this mean to you? You can now create support tickets at https://mtnfog.freshdesk.com. If you email firstname.lastname@example.org with a support request we will create a helpdesk ticket on your behalf.
Over the next weeks we will be populating the helpdesk’s FAQs and solutions with the goal of documenting many common problems, questions, and solutions.
If you have any comments or questions please always feel free to drop us a line at email@example.com.
The Idyl API is described on the api page. The Idyl interface is just a set of REST webservices. To reduce the time necessary to develop for Idyl we have created wrappers for in both Java and .NET. Here’s a quick look at how to use the Java SDK with Maven.
First, add our repository to your pom.xml:
Next, add the Idyl SaaS SDK dependency:
Now with that done you can move on to the fun stuff. Here’s a snippet of using the SDK for extracting entities:
All you need to do is replace the example API key with your Idyl API key and set your sentence value. And that is all. The request will be sent to Idyl and the extracted entities will be printed.
If you want to store the extracted entities to query over them later uncomment the two lines that set the context and document ID and set your values. Think of the context as the name for a collection of documents and the document ID as the name for a single document. For example, if you were extracting entities from books the context could be the type of book (fiction, nonfiction) or the author and the document ID could be each book’s title. Now with stored entities you can use the SDK to query those entities. The code follows the same pattern as above:
This code executes a SPARQL query on your entities that simply returns all entity names under the testcontext and document ID doc1. The query will be sent to Idyl and the returned entities will be printed.
The use of the .NET SDK is very similar but we will describe it soon!
Welcome to our new blog! We’re excited to blog and share with you information about services. Right now we are busy getting Idyl ready for prime time. Idyl will soon be open for beta users. If you would like to be notified when it is ready please let us know.
We’re also on Twitter! Send us a tweet @mtnfog.