A publicly traded index focused on bitcoin investments is using NLP to select the holdings. From the press release:
The index underlying KOIN was constructed utilizing a natural language processing algorithm that screens for global stocks that are believed to have a current or future economic interest in blockchain technology. By harnessing the power of textual analysis and artificial intelligence, companies are uncovered that might otherwise be overlooked by traditional analytical research.
It’s true there is a lot of information in unstructured text but to make that information useful it needs to be extracted and understood on a large-scale basis. This new fund is a great example of a practical use of NLP. If we take a minute to think about the requirements for a system like this we can identify these items:
- Scalable – The system has to support an enormous amount of text quickly. News didn’t stop or take a break to let us catch up. The system must scale horizontally to meet demand.
- Multi-lingual – Blockchain news isn’t just written in English or any other single language. The system must be able to support text documents written in many different languages. We’re interested in global stocks.
- Customizable – Press releases and news reports represent two specific categories of text. They aren’t like other categories such as legal documents, encyclopedia articles, or general human conversation text. The system needs to be customizable in that it can support text from various formats. A general, all-purpose document processor won’t give us the results we need.
- NLP – The system likely needs to be able to process natural language text and identify key topics, generate summaries, identify entities (companies and persons), and detect sentiment.
There are, of course, always other requirements but these represent arguably the largest areas.
How can we meet these requirements? To help provide scalabilty we can use an establish cloud provider like AWS or Azure. These platforms give us the tools we need in order to make an application scale to meet demand so that’s a good starting point. For the other requirements we can select from available tools based on whether we are making our own implementation from the ground up or using components publicly available. Both ways have their own advantages and disadvantages. To save time (and money) we’ll assume you would rather use other tools instead of building them yourself. If not, then you better stop reading and get to coding!