Ontology highlight
ABSTRACT:
SUBMITTER: Sarker A
PROVIDER: S-EPMC5144647 | biostudies-literature | 2017 Feb
REPOSITORIES: biostudies-literature
Sarker Abeed A Gonzalez Graciela G
Data in brief 20161123
In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representation ...[more]