Dataset Information

Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.

ABSTRACT: BACKGROUND:Social media data are being increasingly used for population-level health research because it provides near real-time access to large volumes of consumer-generated data. Recently, a number of studies have explored the possibility of using social media data, such as from Twitter, for monitoring prescription medication abuse. However, there is a paucity of annotated data or guidelines for data characterization that discuss how information related to abuse-prone medications is presented on Twitter. OBJECTIVE:This study discusses the creation of an annotated corpus suitable for training supervised classification algorithms for the automatic classification of medication abuse-related chatter. The annotation strategies used for improving interannotator agreement (IAA), a detailed annotation guideline, and machine learning experiments that illustrate the utility of the annotated corpus are also described. METHODS:We employed an iterative annotation strategy, with interannotator discussions held and updates made to the annotation guidelines at each iteration to improve IAA for the manual annotation task. Using the grounded theory approach, we first characterized tweets into fine-grained categories and then grouped them into 4 broad classes-abuse or misuse, personal consumption, mention, and unrelated. After the completion of manual annotations, we experimented with several machine learning algorithms to illustrate the utility of the corpus and generate baseline performance metrics for automatic classification on these data. RESULTS:Our final annotated set consisted of 16,443 tweets mentioning at least 20 abuse-prone medications including opioids, benzodiazepines, atypical antipsychotics, central nervous system stimulants, and gamma-aminobutyric acid analogs. Our final overall IAA was 0.86 (Cohen kappa), which represents high agreement. The manual annotation process revealed the variety of ways in which prescription medication misuse or abuse is discussed on Twitter, including expressions indicating coingestion, nonmedical use, nonstandard route of intake, and consumption above the prescribed doses. Among machine learning classifiers, support vector machines obtained the highest automatic classification accuracy of 73.00% (95% CI 71.4-74.5) over the test set (n=3271). CONCLUSIONS:Our manual analysis and annotations of a large number of tweets have revealed types of information posted on Twitter about a set of abuse-prone prescription medications and their distributions. In the interests of reproducible and community-driven research, we have made our detailed annotation guidelines and the training data for the classification experiments publicly available, and the test data will be used in future shared tasks.

SUBMITTER: O'Connor K

PROVIDER: S-EPMC7066507 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.

O'Connor Karen K Sarker Abeed A Perrone Jeanmarie J Gonzalez Hernandez Graciela G

Journal of medical Internet research 20200226 2

<h4>Background</h4>Social media data are being increasingly used for population-level health research because it provides near real-time access to large volumes of consumer-generated data. Recently, a number of studies have explored the possibility of using social media data, such as from Twitter, for monitoring prescription medication abuse. However, there is a paucity of annotated data or guidelines for data characterization that discuss how information related to abuse-prone medications is pr ...[more]

PMID: 32130117

Similar Datasets

Project description:BackgroundChronic pain affects more than 20% of adults in the United States and is associated with substantial physical, mental, and social burden. Clinical text contains rich information about chronic pain, but no systematic appraisal has been performed to assess the electronic health record (EHR) narratives for these patients. A formal content analysis of the unstructured EHR data can inform clinical practice and research in chronic pain.ObjectiveWe characterized individual episodes of chronic pain by annotating and analyzing EHR notes for a stratified cohort of adults with known chronic pain.MethodsWe used the Rochester Epidemiology Project infrastructure to screen all residents of Olmsted County, Minnesota, for evidence of chronic pain, between January 1, 2005, and September 30, 2015. Diagnosis codes were used to assemble a cohort of 6586 chronic pain patients; people with cancer were excluded. The records of an age- and sex-stratified random sample of 62 patients from the cohort were annotated using an iteratively developed guideline. The annotated concepts included date, location, severity, causes, effects on quality of life, diagnostic procedures, medications, and other treatment modalities.ResultsA total of 94 chronic pain episodes from 62 distinct patients were identified by reviewing 3272 clinical notes. Documentation was written by clinicians across a wide spectrum of specialties. Most patients (40/62, 65%) had 1 pain episode during the study period. Interannotator agreement ranged from 0.78 to 1.00 across the annotated concepts. Some pain-related concepts (eg, body location) had 100% (94/94) coverage among all the episodes, while others had moderate coverage (eg, effects on quality of life) (55/94, 59%). Back pain and leg pain were the most common types of chronic pain in the annotated cohort. Musculoskeletal issues like arthritis were annotated as the most common causes. Opioids were the most commonly captured medication, while physical and occupational therapies were the most common nonpharmacological treatments.ConclusionsWe systematically annotated chronic pain episodes in clinical text. The rich content analysis results revealed complexity of the chronic pain episodes and of their management, as well as the challenges in extracting pertinent information, even for humans. Despite the pilot study nature of the work, the annotation guideline and corpus should be able to serve as informative references for other institutions with shared interest in chronic pain research using EHRs.

Project description:PurposeBoth the superstructures of virtual discourse in radiation oncology and the entities occupying influential positions in the social media landscape of radiation oncology remain poorly characterized.Methods and materialsNodeXL Pro was used to prospectively sample all tweets with the hashtag #radonc every 8 to 10 days during the course of 1 year (December 4, 2018, to November 29, 2019). Twitter handles were grouped into conversational clusters using the Clauset-Newman-Moore community detection algorithm. For each sample period, the top 10 #radonc Twitter influencers, defined using betweenness centrality, were categorized. Influencers were scored in each sample period according to their top 10 influence rank and summarized with descriptive statistics. Linear regression assessed for characteristics that predicted higher influence scores among top influencers.ResultsIn the study, 684,000 tweets were sampled over 38 periods. #radonc tweets took on the crowd superstructure of a hub-and-spoke broadcast network formed when prominent individuals are widely repeated by many audience members. Professional societies were the most influential category of Twitter handles with an average influence score of 7.63 out of 10 (standard deviation [SD] = 1.94). When industry handles were present among top 10 influencers, they exhibited the second highest average influence scores (6.75, SD = 1.06), followed by individuals with scores of 5.28 (SD = 0.43). The categories of influencers were stable during the course of 1 year. The role of attending physician, radiation oncology specialty, male sex, academic practice, and US-based handles in North America were predictors of higher influence score.ConclusionsTwitter influencers in radiation oncology represent a diverse group of people and organizations, but male academic radiation oncologists based in North America occupy particularly influential positions in virtual communities broadly characterized as "hub and spoke" broadcast networks. Periodic network-based analyses of the social media discourse in radiation oncology are warranted to maintain an awareness of the handles that are influencing discussions on Twitter and ensure that social media utilization continues to contribute to the field of radiation oncology in a meaningful way.

Project description:BackgroundThe prevalence of abuse, diversion, and web-based endorsement of tapentadol (extended-release [ER], immediate-release [IR]) has been characterized as low compared with other prescription opioids. Little is known about individual experience with tapentadol nonmedical use (NMU).ObjectiveThis study aims to pilot web-based survey technologies to investigate the motivation for tapentadol NMU, sources of procurement, routes of administration, tampering methods, doses used, and impressions of tapentadol products (Nucynta and Nucynta ER).MethodsRecruitment flyers and banner advertisements were placed on the Bluelight website [DragonByte Technologies Ltd] with a link to a web-based survey (Qualtrics) designed to query about individuals' lifetime tapentadol NMU. This web-based survey was followed by an interactive web-based chat (Cryptocat) with respondents who were willing to be contacted. Respondents were queried about sources for obtaining tapentadol, motives for use, routes of administration, tampering methods, drugs used in combination, tablet strengths and dosages, and reasons for continued or discontinued use. Desirability and attractiveness for NMU was rated.ResultsWeb-based recruitment successfully attracted difficult-to-find study participants. A total of 78 participants reported that tapentadol was obtained from friends and family (ER 11/30, 37%; IR 18/67, 27%), the internet (ER 11/30, 37%; IR 12/67, 18%) or participants' own prescriptions from a doctor (ER 9/30, 30%; IR 17/67, 25%). It was used nonmedically for pain relief (ER 18/30, 60%; IR 33/67, 49%) and multiple psychotropic effects, including relaxation (ER 13/30, 43%; IR 29/67, 43%), reduction in depression or anxiety (ER 7/30, 23%; IR 30/67, 45%), or getting high (ER 12/30, 40%; IR 33/67, 49%). Tapentadol was primarily swallowed (ER 22/30, 73%; IR 55/67, 82%), although snorting (ER 2/30, 7%; IR 8/67, 12%) and injection (ER 2/30, 7%; IR 5/67, 8%) were also reported. The preferred dose for NMU was 100 mg (both ER and IR). The participants reported tapentadol use with benzodiazepines (ER 12/21, 57%; IR 28/47, 60%). Most participants had discontinued tapentadol NMU at the time of survey completion (ER 22/30, 73%; IR 55/67, 82%). Reasons for discontinued ER NMU included side effects (10/22, 46%) and lack of effective treatment (10/22, 46%). Reasons for discontinued IR NMU included lack of access (26/55, 47%) and better NMU options (IR 21/55, 38%). Few individuals were willing to divulge identifying information about themselves for the interactive chat (8/78, 10%), demonstrating the strength of anonymous, web-based surveys. Interactive chat supported the survey findings. A subgroup of participants (4/78, 5%) reported hallucinogenic side effects with high doses.ConclusionsWeb-based surveys can successfully recruit individuals who report drug NMU and those who are difficult to find. Tapentadol NMU appears to occur primarily for pain relief and for its psychotropic effects. Although it was liked by some, tapentadol did not receive a robust pattern of endorsement for NMU.

Project description:Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text.This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement.As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

Project description:The COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide between the Democratic and Republican party's narratives about the pandemic which resulted in polarization of individual behaviors and divergent policy adoption across regions. As shown in this case, as well as in most major social issues, strongly polarized narrative frameworks facilitate such narratives. To understand polarization and other social chasms, it is critical to dissect these diverging narratives. Here, taking the Democratic and Republican political social media posts about the pandemic as a case study, we demonstrate that a combination of computational methods can provide useful insights into the different contexts, framing, and characters and relationships that construct their narrative frameworks which individual posts source from. Leveraging a dataset of tweets from the politicians in the U.S., including the ex-president, members of Congress, and state governors, we found that the Democrats' narrative tends to be more concerned with the pandemic as well as financial and social support, while the Republicans discuss more about other political entities such as China. We then perform an automatic framing analysis to characterize the ways in which they frame their narratives, where we found that the Democrats emphasize the government's role in responding to the pandemic, and the Republicans emphasize the roles of individuals and support for small businesses. Finally, we present a semantic role analysis that uncovers the important characters and relationships in their narratives as well as how they facilitate a membership categorization process. Our findings concretely expose the gaps in the "elusive consensus" between the two parties. Our methodologies may be applied to computationally study narratives in various domains.Supplementary informationThe online version contains supplementary material available at 10.1140/epjds/s13688-021-00308-4.

Dataset Information

Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.

Publications

Promoting Reproducible Research for Characterizing Nonmedical Use of Medications Through Data Annotation: Description of a Twitter Corpus and Guidelines.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets