Idyl E3 (Entity Extraction Engine) is a turnkey entity extraction product for private clouds. With separate editions available for different entity types, Idyl E3 is a compact, simple, and cost-effective solution because the cost is not affected by the number of API requests.
Idyl E3 supports storing extracted entities in an Entity Store and integration with MetaText for person entity disambiguation and enrichment services.
Obtaining and Installation
Idyl E3 is available for free from the Mountain Fog website and through various cloud marketplaces such as the AWS Marketplace. Through these marketplaces Idyl E3 is delivered as a self-contained virtual machine image ready to go out of the box. Follow the instructions of the cloud marketplace to launch Idyl E3 into your cloud.
When launched through a cloud marketplace no installation is required for Idyl E3. After the virtual machine boots and Idyl E3 initializes itself it will be ready for use. In your web browser go to http://ip-address:9000 to access the Idyl E3 Dashboard. At the dashboard you can view details about Idyl E3 and configure its settings, if needed.
Getting Started with Idyl E3
To get started using Idyl E3 go to the Idyl E3 dashboard at http://ip-address:9000. The default login varies based upon your deployment platform. The possible login credentials are:
Default Login Credentials
- If launched through the AWS Marketplace the default username is admin and the password is the instance ID (e.g. i-1234567).
- If launched on any other platform default the username is admin and the password is admin.
Entity Module Loading
When Idyl E3 starts the entity modules are loaded. The entity models are the “brains” to Idyl E3 and they drive the entity extraction process. The load process can be time consuming based on the enabled modules and the resources available to the underlying virtual machine. Until the extraction modules are loaded all API requests will receive in response HTTP response 503 Service Unavailable and a warning message will be shown on all dashboard pages. All other features of Idyl E3 will be available while the modules are being loaded.
Using the Dashboard
The Idyl E3 dashboard provides the ability to perform manual entity extractions through a form on the dashboard. This form is meant to allow you to test Idyl E3 and its settings without having to send it an API request.
Idyl E3 has an internal rules engine that processes the extracted entities. The rules engine logic, implemented by the Drools rules engine, is extremely powerful and allows virtually any functionality in response to a desired entity. For example, a rule can be written to send an email message if an extracted entity matches a specific text. Idyl E3 includes support for sending email via AWS Simple Email Service, sending notifications via AWS Simple Notification Service, publishing to an AWS Simple Queue Service queue, and publishing to AWS Kinesis streams. You can implement your own custom functions to fit your needs. The rules are stored on the file system at /usr/share/idyl-e3/rules.
The Idyl E3 log is located at /var/log/idyl-e3/idyl-e3.log. Refer to this log when diagnosing problems and issues.
In the Extraction Modules settings you can select the types of entities to be extracted. Note that each named-entity extraction module requires approximately 4 GB of available RAM. Selecting multiple modules will increase the load time when Idyl E3 starts. If your virtual machine does have the necessary RAM you may need to make more RAM available to Idyl E3. Refer to the Tuning Idyl E3 section below.
To enable MetaText disambiguation and enrichment services provide your MetaText API key or your Mashape application key. The MetaText integration will be enabled immediately and future entity extraction requests that contain person entities will utilize MetaText for entity disambiguation and enrichment.
By default Idyl E3’s API requires no authentication. To require an API key, change the Authentication Method from “None” to “API Key” and provide an API key. All API requests (except requests to /api/health and /api/status) will now require the API key to be present in the Authorization header.
The Entity Filters allow you to filter out false positives based on different criteria. By default all entity filters are enabled. You can disable certain filters if necessary to customize and improve the entity extraction for your text. The combination of filters that is most effective will vary based upon the contents of your text.
The entity store provides a method of storing extracted entities for later analysis and reference. Idyl E3 supports using any JDBC-compliant database or a DynamoDB table as an entity store. The entity store is disabled by default.
RDBMS Entity Store
By default, the RDMBS used is an embedded Apache Derby. This database is not recommended for production use. The embedded Apache Derby entity store database is not open to outside connections. If using a custom database the table schema for MySQL is:
-- -- Table structure for table `StoredEntities` -- CREATE TABLE IF NOT EXISTS `StoredEntities` ( `id` int(11) NOT NULL AUTO_INCREMENT, `text` varchar(255) NOT NULL, `type` varchar(255) NOT NULL, `confidence` double NOT NULL, `context` varchar(255) DEFAULT NULL, `documentId` varchar(255) DEFAULT NULL, `uri` varchar(255) DEFAULT NULL, `extractionDate` date DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ; -- -- Table structure for table `StoredEntityEnrichments` -- CREATE TABLE IF NOT EXISTS `StoredEntityEnrichments` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `value` varchar(255) NOT NULL, `entityId` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `FK_lsqk8i0sd1fa7oibjv9s0uihj` (`entityId`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
DynamoDB Entity Store
The DynamoDB table should be created as:
- Hash key: context (String)
- Range key: id (String)
The DynamoDB entity store supports authentication via an access and secret key combination or via IAM instance roles. To use an IAM instance role, leave the access and secret key fields empty when configuring the entity store.
Idyl E3 can provides “integrations” to better embed it in your existing processes and workflows.
The Cloud Watch integration when enabled will report entity extraction times to Amazon Web Services’ Cloud Watch service. Using this integration allows you to monitor the performance of Idyl E3 and provides a mechanism for driving EC2 autoscaling.
The Amazon Web Services’ Simple Queue Service (SQS) integration when enabled will cause Idyl E3 to consume entity extraction requests from an SQS queue. Entity extraction requests can be published to an SQS queue via the Idyl E3 SDKs or manually. Contact us for help using this feature.
The import/export settings feature lets you download a copy of Idyl E3’s current settings for backup and migration purposes. Please note that the settings in the exported backup file are not encrypted.
Idyl E3 can easily operate in a cluster by placing each Idyl E3 virtual machine behind a load balancer. In an AWS cloud, an Elastic Load Balancer can be used to distribute requests across multiple Idyl E3 instances. Autoscaling can be used to increase or decrease the number of Idyl E3 instances needed to satisfy demand. If any of Idyl E3’s settings must be modified to meet your needs it is best to create a new AMI based off the customized instance.
In other cloud types, an Apache or Nginx server can be configured as a reverse proxy in front of one or more Idyl E3 virtual machines.
Idyl E3 features a simple REST interface. See the API documentation for details on the REST interface.
SDKs for Java and .NET are available. These SDKs provide convenient methods for interacting with Idyl E3. The Java SDK is available through Maven Central and the .NET SDK is available through NuGet.
Our team is available to help with your development tasks related to Idyl E3 integration with your systems. Please contact us at firstname.lastname@example.org to discuss how we can assist with your integration at no cost to you.
Idyl E3 instances launched via the AWS Marketplace can be accessed via SSH using your private key. The username is ubuntu.
Idyl E3 virtual machines can be accessed via SSH on port 22. Refer to your cloud platform’s documentation for the user credentials or contact us for assistance.
When you are logged in to Idyl E3 via SSH, you can control the Idyl E3 service through the service command.
|sudo service idyl start||Starts the Idyl E3 service.|
|sudo service idyl stop||Stops the Idyl E3 service.|
There are a few important files and directories that you should be aware of:
|File / Directory||Purpose|
|/usr/share/idyl-e3/||The Idyl E3 home directory. Files under this directory generally should not be modified.|
|/usr/share/idyl-e3/logs/catalina.out||Idyl E3’s log file. Refer to this log file to diagnose issues.|
Configuring Firewall Access
Port requirements for Idyl E3 are:
|9000||The Idyl E3 dashboard and API is accessed via port 9000.||Yes|
|22||SSH access into the Idyl E3 virtual machine.||No|
It is highly recommended that these ports only be open to the hosts that require access to their provided services.
Each Idyl E3 named-entity extraction module requires approximately 4 GB of RAM. If the Idyl E3 virtual machine has sufficient memory you may still need to increase the amount of memory available to Idyl E3. You can increase the amount of RAM available to Idyl E3 by opening the /usr/share/idyl-e3/bin/setenv.sh file and modifying the -Xmx parameter. Restart Idyl E3 sudo service idyl restart for the changes to take affect.
Because Idyl E3’s entity extraction is memory intensive, having more available RAM will provide increased performance and responsiveness.