Unknown

Dataset Information

0

Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.


ABSTRACT:

Importance

Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States.

Objective

To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter.

Design, setting, and participants

This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 years of publicly available social media posts on Twitter, dated from January 1, 2012, to October 31, 2015, that were geolocated in Pennsylvania. Opioid-mentioning tweets were extracted using prescription and illicit opioid names, including street names and misspellings. Social media posts (tweets) (n?=?9006) were manually categorized into 4 classes, and training and evaluation of several machine learning algorithms were performed. Temporal and geospatial patterns were analyzed with the best-performing classifier on unlabeled data.

Main outcomes and measures

Pearson and Spearman correlations of county- and substate-level abuse-indicating tweet rates with opioid overdose death rates from the Centers for Disease Control and Prevention WONDER database and with 4 metrics from the National Survey on Drug Use and Health for 3 years were calculated. Classifier performances were measured through microaveraged F1 scores (harmonic mean of precision and recall) or accuracies and 95% CIs.

Results

A total of 9006 social media posts were annotated, of which 1748 (19.4%) were related to abuse, 2001 (22.2%) were related to information, 4830 (53.6%) were unrelated, and 427 (4.7%) were not in the English language. Yearly rates of abuse-indicating social media post showed statistically significant correlation with county-level opioid-related overdose death rates (n?=?75) for 3 years (Pearson r?=?0.451, P?Conclusions and relevanceThe correlations obtained in this study suggest that a social media-based approach reliant on supervised machine learning may be suitable for geolocation-centric monitoring of the US opioid epidemic in near real time.

SUBMITTER: Sarker A 

PROVIDER: S-EPMC6865282 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Machine Learning and Natural Language Processing for Geolocation-Centric Monitoring and Characterization of Opioid-Related Social Media Chatter.

Sarker Abeed A   Gonzalez-Hernandez Graciela G   Ruan Yucheng Y   Perrone Jeanmarie J  

JAMA network open 20191101 11


<h4>Importance</h4>Automatic curation of consumer-generated, opioid-related social media big data may enable real-time monitoring of the opioid epidemic in the United States.<h4>Objective</h4>To develop and validate an automatic text-processing pipeline for geospatial and temporal analysis of opioid-mentioning social media chatter.<h4>Design, setting, and participants</h4>This cross-sectional, population-based study was conducted from December 1, 2017, to August 31, 2019, and used more than 3 ye  ...[more]

Similar Datasets

| S-EPMC8132982 | biostudies-literature
| S-EPMC8325804 | biostudies-literature
| S-EPMC10337393 | biostudies-literature
| S-EPMC7845988 | biostudies-literature
| S-EPMC9124945 | biostudies-literature
| S-EPMC6309052 | biostudies-literature
| S-EPMC7557517 | biostudies-literature
| S-EPMC9653489 | biostudies-literature
| S-EPMC11368170 | biostudies-literature
| S-EPMC7393999 | biostudies-literature