Datasets and Derived Artifacts

We have made the following datasets and derived artifacts available for your use in NLP and text mining tasks. Please refer to each item’s README file for important information on how the model was created. For additional information on any of the datasets available here please contact us.

A free Mountain Fog account is required to access the downloads.

PubMed Open Access Subset (Commercial) Pretrained word2vec Vectors

Word vectors generated by word2vec from the commercial Open Access Subset of the PubMed collection of biomedical literature. The text was preprocessed by converting all text to lowercase and removing all punctuation. These vectors were created in August 2019.

Sign in to access downloads

You must be logged in to view this content.

PubMed Open Access Subset (Commercial) Pretrained fastText Vectors

Word vectors generated by fastText from the commercial Open Access Subset of the PubMed collection of biomedical literature. The text was preprocessed by converting all text to lowercase and removing all punctuation. These vectors were created in August 2019. Refer to the fastText documentation for information on the file formats.

Sign in to access downloads

You must be logged in to view this content.