Quick List of Pretrained Word Vectors

In the past few years word vectors have become all the rage in NLP and rightly so. It’s hard today to find some application of NLP that doesn’t involve the use of word vectors. The fact that word vectors are generated using unsupervised learning makes them even more appealing.

In a future post we’ll take a look at what exactly are word vectors but in this post I wanted to just give a quick list of pretrained word vectors that you can use now. There are several different algorithms and implementations for generating word vectors with the most famous likely being word2vec.

word2vec – These pretrained vectors were created from a set of Google News dataset containing about 100 billion words. 

GloVe – The GloVe pretrained vectors were created from Wikipedia, a combination of Wikipedia and Common Crawl, and Twitter. 

fastText – The fastText pretrained vectors were created from Wikipedia. They are available for 294 languages. 

Please note the license each pretrained vector is released under prior to using them in your applications.

Leave a Reply