New AWS NLP Service

The AWS reInvent conference in Las Vegas always results in announcements of new AWS services. This year AWS announced a new addition to their cloud-based NLP service.

Amazon Comprehend Medical – Natural Language Processing for Healthcare Customers is a service for understanding unstructured natural language medical text. From the announcement, it supports extracting entities from a vocabulary of medical terms and extracting Protected Health Information (PHI) such as addresses and medical record numbers. For a full description and code samples see the AWS blog post. Pricing is based on the usage of the service.

Apache cTakes

While this is an interesting and exciting new product, I would be remiss to not mention that this functionality is largely available in the open source application called cTakes. cTakes, or “clinical Text Analysis Knowledge Extraction System”, is an Apache project for extracting information from natural language medical record clinical text. cTakes is used by many large hospitals and referenced in many publications.

Being open source, cTakes is free to use and modify. You can deploy cTakes on-premises or in your cloud without paying any fees for usage. You only have to pay for the hardware that it is running on. Depending on your usage, the cost difference between a service like Amazon Comprehend Medical and cTakes could be significant so I recommend evaluating both if you need to process medical records.

Natural Language Processing and Information Extraction for Biomedicine

AWS T3 Instance

Here’s how to quickly make a T3 instance from a T2 instance.

  1. Take a snapshot of the EBS volume.
  2. Run the following AWS CLI command providing the snapshot ID and customizing the name and description properties:
aws ec2 register-image \
    --architecture "x86_64" \
    --ena-support \
    --name "AMI Name" \
     --description "AMI Description" \
    --root-device-name "/dev/sda1" \
    --block-device-mappings "[{\"DeviceName\": \"/dev/sda1\",\"Ebs\": {\"SnapshotId\": \"snap-000111222333444\"}}]" \
    --virtualization-type "hvm"

Now you can launch a T3 instance from the newly created AMI.

Creating Custom Tokenization Models with Sonnet Tokenization Engine

Sonnet Tokenization Engine 1.1.0 includes the ability to train custom token models from your text. Using your own token model provides improved performance because the model will more closely match your text to be tokenized. This post describes how to launch an instance of Sonnet Tokenization Engine on AWS, connect to it, train a custom token model, and then use it.

To get started, let’s launch an instance of Sonnet Tokenization Engine from the AWS Marketplace. On the product page, click the orange “Continue to Subscribe” button.

 

On the next page, we highly recommend selecting a VPC from the VPC Settings options. This is to allow you to launch Sonnet Tokenization Engine on a newer instance type. Select your VPC and a public subnet.

Now, select an instance type. We recommend a t2.micro for this demonstration. In production you will likely want a larger instance type.

Now click the “Launch with 1-Click” button!

An instance of Sonnet Tokenization Engine will now be starting in your AWS account. Head over to your EC2 console to check it out. By default, for security purposes port 22 for SSH is not open to the instance. Let’s open port 22 so we can SSH to the instance. Click on the instance’s security group, click Inbound Rules, and add port 22. Now let’s SSH into the instance.

ssh -i keypair.pem ec2-user@ec2-34-201-136-186.compute-1.amazonaws.com

Sonnet Tokenization Engine is installed under /opt/sonnet.

cd /opt/sonnet

Training a custom token model requires training data. The format for this data is a single sentence per line with tokens separated by whitespace or <SPLIT>. You can download sample training data for this exercise.

wget https://s3.amazonaws.com/mtnfog-public/token.train -O /tmp/token.train

We also need a training definition file. Again, we can download one for this exercise:

wget https://s3.amazonaws.com/mtnfog-public/token-training-definition.xml -O /tmp/token-training-definition.xml

Using these two files we are now ready to train our model.

sudo su sonnet
./bin/train-model.sh /tmp/token-training-definition.xml

The output will look similar to:

Sonnet Token Model Generator
Version: 1.1.0
Beginning training using definition file: /tmp/token-training-definition.xml
2018-03-17 12:47:46,135 DEBUG [main] models.ModelOperationsUtils (ModelOperationsUtils.java:40) - Using OpenNLP data format.
2018-03-17 12:47:46,260 INFO  [main] training.TokenModelOperations (TokenModelOperations.java:282) - Beginning tokenizer model training. Output model will be: /tmp/token.bin
Indexing events with TwoPass using cutoff of 0

	Computing event counts...  done. 6002 events
	Indexing...  done.
Collecting events... Done indexing in 0.54 s.
Incorporating indexed data for training...  
done.
	Number of Event Tokens: 6002
	    Number of Outcomes: 2
	  Number of Predicates: 6290
Computing model parameters...
Performing 100 iterations.
  1:  . (5991/6002) 0.9981672775741419
  2:  . (5995/6002) 0.9988337220926358
  3:  . (5996/6002) 0.9990003332222592
  4:  . (5997/6002) 0.9991669443518827
  5:  . (5996/6002) 0.9990003332222592
  6:  . (5998/6002) 0.9993335554815062
  7:  . (5998/6002) 0.9993335554815062
  8:  . (6000/6002) 0.9996667777407531
  9:  . (6000/6002) 0.9996667777407531
 10:  . (6000/6002) 0.9996667777407531
Stopping: change in training set accuracy less than 1.0E-5
Stats: (6002/6002) 1.0
...done.
Compressed 6290 parameters to 159
1 outcome patterns
Entity model generated complete. Summary:
Model file   : /tmp/token.bin
Manifest file : token.bin.manifest
Time Taken    : 2690 ms

The created model file and its associated manifest file will have been created. Copy the manifest file to Sonnet’s models directory.

cp /tmp/token.bin.manifest /opt/sonnet/models/

Now start/restart Sonnet.

sudo service sonnet restart

The model will be loaded and ready for use. All API requests for tokenization that are received for the model’s language will be processed by the model. To try it:

curl "http://ec2-34-201-136-186.compute-1.amazonaws.com:9040/api/tokenize?language=eng" -d "Tokenize this text please." -H "Content-Type: text/plain"

Intel “Meltdown” and “Spectre” Vulnerabilities

With the recent announcement of the vulnerabilities known as “Spectre” and “Meltdown” in Intel processors we have made this post to inform our users how to protect their virtual machines of our products launched via cloud marketplaces.

Products Launched via Docker Containers

Docker uses the host’s system kernel. Refer to your host OS’s documentation on applying the necessary kernel patch.

Products Launched via the AWS Marketplace

The following product versions are using kernel 4.9.62-21.56.amzn1.x86_64 which needs updated.

  • Renku Language Detection Engine 1.0.0
  • Prose Sentence Extraction Engine 1.0.0
  • Sonnet Tokenization Engine 1.0.0
  • Idyl E3 Entity Extraction Engine 3.0.0

Run the following commands on each instance:

sudo yum update
sudo reboot
uname -r

The output of the last command will an updated kernel version of 4.9.76-3.78.amzn1.x86_64 (or newer). Details are available on the AWS Amazon Linux Security Center.

Products Launched via the Azure Marketplace

The following product versions are running on CentOS 7.3 on kernel 3.10.0-514.26.2.el7.x86_64 which needs updated.

  • Renku Language Detection Engine 1.0.0
  • Prose Sentence Extraction Engine 1.0.0
  • Sonnet Tokenization Engine 1.0.0
  • Idyl E3 Entity Extraction Engine 3.0.0

Run the following commands on each virtual machine:

sudo yum update
sudo reboot
uname -r

The output of the last command will show an updated kernel version of 3.10.0-693.11.6.el7.x86_64 (or newer). For more information see the Red Hat Security Advisory and the announcement email.

 

 

Idyl E3 Entity Extraction Engine AWS Reference Architectures

With the popularity of running Idyl E3 Entity Extraction Engine on AWS we wanted to provide some AWS reference architectures to help you get started deploying Idyl E3 to AWS. Don’t forget Idyl E3 is available on the AWS Marketplace for easy launching and we have some Idyl E3 CloudFormation templates available on GitHub. We offer managed Idyl E3 services is you prefer a hands-off approach to Idyl E3 deployment and operation.

A Few Notes Before Starting

Using a Pre-Configured AMI

No matter what architecture you choose we recommend creating a pre-configured Idyl E3 AMI and using it to launch new Idyl E3 instances. This method is recommended instead of relying on user-data scripts to perform the configuration because the time required to spin up a pre-configured AMI can be significantly less than user-data scripts. If you want to have the AMI configuration under source control I highly recommend using Hashicorp’s Packer to build the AMI.

Stateless API

Before we describe the architectures it is helpful to note that the Idyl E3 API is stateless. There is no session data necessary to be shared by multiple instances and as long as all Idyl E3 instances are configured identically (as they should be when behind a load balancer), it does not matter which instance gets routed the entity extraction request. We can take advantage of this stateless architecture to allow us to scale Idyl E3 up (and down) as much as we need to in order to meet the demands of the load.

Load-balanced Architecture

The first architecture is a very simple one yet probably adequate to meet the needs of most users. This architecture has a single VPC that contains two subnets. One subnet is public and it contains an Elastic Load Balancer (ELB) and the other subnet is private and it contains the Idyl E3 instances. In the diagram shown below, the ELB is set to be a public ELB allowing Idyl E3 requests to be received from the internet. However, if your application will also run in the VPC you can change the ELB to an internal ELB. Note that this architecture uses a fixed number of Idyl E3 instances behind the ELB. Any scaling up or down will have to be performed manually when needed. Idyl E3’s API has a /health endpoint that returns HTTP 200 OK when everything is ok and that is perfect for ELB instance health checks.

Simple Idyl E3 AWS Architecture with VPC and ELB

Load-balanced and Auto-scaling Architecture

Launch the Idyl E3 CloudFormation stack!

The previous architecture is a simple but very functional and it minimizes cost. The first thing that will be noticed in this architecture is the static nature of the Idyl E3 instances. To provide some flexibility we can modify this architecture a bit to put the Idyl E3 instances into an autoscaling group. We can use the group’s Desired Capacity to manually control the number of Idyl E3 instances or we can configure the autoscaling group to automatically scale up and down based on some chosen metrics. The average CPU usage is a good metric for scaling Idyl E3 because entity extraction can cause the CPU usage to rise. With that change here is what our architecture looks like now:

Idyl E3 AWS architecture with VPC, ELB, and autoscaling.

With the autoscaling we don’t have to worry about unexpected surges or decreases in entity extraction requests. The number of Idyl E3 instances will automatically scale up and down based on the average CPU usage of all Idyl E3 instances. Scaling down is important in order to keep costs to a minimum. Nobody wants to pay for more than what they need.

This architecture is available in our GitHub repository of Idyl E3 CloudFormation Templates. The template also contains an optional bastion instance to facilitate SSH access into the Idyl E3 instances from outside the VPC.

Need more?

Got more complicated requirements? Let us know. We have AWS certified engineers on staff and we’ll be glad to help.

Amazon EBS Elastic Volumes

On Feb 13, 2017, Amazon Web Services announced elastic EBS volumes! If you have used EC2 much you have undoubtedly been frustrated by the rigidness of EBS volumes. Once created they could not be modified or resized. If your EC2 instance required more disk space your only option was to manually create a new volume of the desired size and attach it to your instance. Now that EBS volumes are more “elastic” you can now simply resize an EBS volume. I put “elastic” in quotes because the volume size can only be increased and not decreased. That’s more elastic than before but sill not completely elastic. In addition to adjusting size, you can now adjust performance and change the volume type even while the volume is in use. These functions are available for your existing EBS volumes.

You can use the AWS CLI to modify a volume:

aws ec2 modify-volume --region us-east-1 --volume-id vol-11111111111111111 --size 200 --volume-type io1 --iops 10000

After enlarging a volume don’t forget to tell your OS to use the newly allocated storage.

This can make like a lot easier is many situation. As described in the AWS blog post, you can use this functionality in combination with CloudWatch and Lamba to automatically enlarge volumes when running low on disk space. You can also use it to simply save money by starting with a smaller EBS volume than what you might need knowing you have the flexibility to increase the capacity of the volumes when needed.

Why do we find this interesting? Our Idyl E3 managed services run in AWS and we encourage all potential customers to launch Idyl E3 from the AWS Marketplace due to its ease of use and turn-key capabilities. So we like to pass interesting and relevant information regarding related services on to our users and readers when it comes available. Learn more about Idyl E3’s entity extraction capabilities.