Dataset Information

Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.

ABSTRACT: BACKGROUND:Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. Although the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations such as low enrollment rate, high cost, and selection bias. OBJECTIVE:The primary objectives of this study were to systematically assess whether social media (Twitter) can be used to discover cohorts of pregnant women and to develop and deploy a natural language processing and machine learning pipeline for the automatic collection of cohort information. In addition, we also attempted to ascertain, in a preliminary fashion, what types of longitudinal information may potentially be mined from the collected cohort information. METHODS:Our discovery of pregnant women relies on detecting pregnancy-indicating tweets (PITs), which are statements posted by pregnant women regarding their pregnancies. We used a set of 14 patterns to first detect potential PITs. We manually annotated a sample of 14,156 of the retrieved user posts to distinguish real PITs from false positives and trained a supervised classification system to detect real PITs. We optimized the classification system via cross validation, with features and settings targeted toward optimizing precision for the positive class. For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined. RESULTS:Our rule-based PIT detection approach retrieved over 200,000 posts over a period of 18 months. Manual annotation agreement for three annotators was very high at kappa (?)=.79. On a blind test set, the implemented classifier obtained an overall F1 score of 0.84 (0.88 for the pregnancy class and 0.68 for the nonpregnancy class). Precision for the pregnancy class was 0.93, and recall was 0.84. Feature analysis showed that the combination of dense and sparse vectors for classification achieved optimal performance. Employing the trained classifier resulted in the identification of 71,954 users from the collected posts. Over 250 million posts were retrieved for these users, which provided a multitude of longitudinal information about them. CONCLUSIONS:Social media sources such as Twitter can be used to identify large cohorts of pregnant women and to gather longitudinal information via automated processing of their postings. Considering the many drawbacks and limitations of pregnancy registries, social media mining may provide beneficial complementary information. Although the cohort sizes identified over social media are large, future research will have to assess the completeness of the information available through them.

SUBMITTER: Sarker A

PROVIDER: S-EPMC5684515 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.

Sarker Abeed A Chandrashekar Pramod P Magge Arjun A Cai Haitao H Klein Ari A Gonzalez Graciela G

Journal of medical Internet research 20171030 10

<h4>Background</h4>Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. Although the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenge ...[more]

PMID: 29084707

Similar Datasets

Project description:IntroductionThere is growing interest in whether social media can capture patient-generated information relevant for medicines safety surveillance that cannot be found in traditional sources.ObjectiveThe aim of this study was to evaluate the potential contribution of mining social media networks for medicines safety surveillance using the following associations as case studies: (1) rosiglitazone and cardiovascular events (i.e. stroke and myocardial infarction); and (2) human papilloma virus (HPV) vaccine and infertility.MethodsWe collected publicly accessible, English-language posts on Facebook, Google+, and Twitter until September 2014. Data were queried for co-occurrence of keywords related to the drug/vaccine and event of interest within a post. Messages were analysed with respect to geographical distribution, context, linking to other web content, and author's assertion regarding the supposed association.ResultsA total of 2537 posts related to rosiglitazone/cardiovascular events and 2236 posts related to HPV vaccine/infertility were retrieved, with the majority of posts representing data from Twitter (98 and 85%, respectively) and originating from users in the US. Approximately 21% of rosiglitazone-related posts and 84% of HPV vaccine-related posts referenced other web pages, mostly news items, law firms' websites, or blogs. Assertion analysis predominantly showed affirmation of the association of rosiglitazone/cardiovascular events (72%; n = 1821) and of HPV vaccine/infertility (79%; n = 1758). Only ten posts described personal accounts of rosiglitazone/cardiovascular adverse event experiences, and nine posts described HPV vaccine problems related to infertility.ConclusionsPublicly available data from the considered social media networks were sparse and largely untrackable for the purpose of providing early clues of safety concerns regarding the prespecified case studies. Further research investigating other case studies and exploring other social media platforms are necessary to further characterise the usefulness of social media for safety surveillance.

Project description:BACKGROUND:Multiple sclerosis (MS) is a chronic neurological disease occurring mostly in women of childbearing age. Pregnant women with MS are usually excluded from clinical trials; as users of the internet, however, they are actively engaged in threads and forums on social media. Social media provides the potential to explore real-world patient experiences and concerns about the use of medicinal products during pregnancy and breastfeeding. OBJECTIVE:This study aimed to analyze the content of posts concerning pregnancy and use of medicines in online forums; thus, the study aimed to gain a thorough understanding of patients' experiences with MS medication. METHODS:Using the names of medicinal products as search terms, we collected posts from 21 publicly available pregnancy forums, which were accessed between March 2015 and March 2018. After the identification of relevant posts, we analyzed the content of each post using a content analysis technique and categorized the main topics that users discussed most frequently. RESULTS:We identified 6 main topics in 70 social media posts. These topics were as follows: (1) expressing personal experiences with MS medication use during the reproductive period (55/70, 80%), (2) seeking and sharing advice about the use of medicines (52/70, 74%), (3) progression of MS during and after pregnancy (35/70, 50%), (4) discussing concerns about MS medications during the reproductive period (35/70, 50%), (5) querying the possibility of breastfeeding while taking MS medications (30/70, 42%), and (6) commenting on communications with physicians (26/70, 37%). CONCLUSIONS:Overall, many pregnant women or women considering pregnancy shared profound uncertainties and specific concerns about taking medicines during the reproductive period. There is a significant need to provide advice and guidance to MS patients concerning the use of medicines in pregnancy and postpartum as well as during breastfeeding. Advice must be tailored to the circumstances of each patient and, of course, to the individual medicine. Information must be provided by a trusted source with relevant expertise and made publicly available.

Project description:As e-cigarette use rapidly increases in popularity, data from online social systems (Twitter, Instagram, Google Web Search) can be used to capture and describe the social and environmental context in which individuals use, perceive, and are marketed this tobacco product. Social media data may serve as a massive focus group where people organically discuss e-cigarettes unprimed by a researcher, without instrument bias, captured in near real time and at low costs.This study documents e-cigarette-related discussions on Twitter, describing themes of conversations and locations where Twitter users often discuss e-cigarettes, to identify priority areas for e-cigarette education campaigns. Additionally, this study demonstrates the importance of distinguishing between social bots and human users when attempting to understand public health-related behaviors and attitudes.E-cigarette-related posts on Twitter (N=6,185,153) were collected from December 24, 2016, to April 21, 2017. Techniques drawn from network science were used to determine discussions of e-cigarettes by describing which hashtags co-occur (concept clusters) in a Twitter network. Posts and metadata were used to describe where geographically e-cigarette-related discussions in the United States occurred. Machine learning models were used to distinguish between Twitter posts reflecting attitudes and behaviors of genuine human users from those of social bots. Odds ratios were computed from 2x2 contingency tables to detect if hashtags varied by source (social bot vs human user) using the Fisher exact test to determine statistical significance.Clusters found in the corpus of hashtags from human users included behaviors (eg, #vaping), vaping identity (eg, #vapelife), and vaping community (eg, #vapenation). Additional clusters included products (eg, #eliquids), dual tobacco use (eg, #hookah), and polysubstance use (eg, #marijuana). Clusters found in the corpus of hashtags from social bots included health (eg, #health), smoking cessation (eg, #quitsmoking), and new products (eg, #ismog). Social bots were significantly more likely to post hashtags that referenced smoking cessation and new products compared to human users. The volume of tweets was highest in the Mid-Atlantic (eg, Pennsylvania, New Jersey, Maryland, and New York), followed by the West Coast and Southwest (eg, California, Arizona and Nevada).Social media data may be used to complement and extend the surveillance of health behaviors including tobacco product use. Public health researchers could harness these data and methods to identify new products or devices. Furthermore, findings from this study demonstrate the importance of distinguishing between Twitter posts from social bots and humans when attempting to understand attitudes and behaviors. Social bots may be used to perpetuate the idea that e-cigarettes are helpful in cessation and to promote new products as they enter the marketplace.

Project description:BackgroundCommunicable diseases pose a severe threat to public health and economic growth. The traditional methods that are used for public health surveillance, however, involve many drawbacks, such as being labor intensive to operate and resulting in a lag between data collection and reporting. To effectively address the limitations of these traditional methods and to mitigate the adverse effects of these diseases, a proactive and real-time public health surveillance system is needed. Previous studies have indicated the usefulness of performing text mining on social media.ObjectiveTo conduct a systematic review of the literature that used textual content published to social media for the purpose of the surveillance and prediction of communicable diseases.MethodologyBroad search queries were formulated and performed in four databases. Both journal articles and conference materials were included. The quality of the studies, operationalized as reliability and validity, was assessed. This qualitative systematic review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.ResultsTwenty-three publications were included in this systematic review. All studies reported positive results for using textual social media content to surveille communicable diseases. Most studies used Twitter as a source for these data. Influenza was studied most frequently, while other communicable diseases received far less attention. Journal articles had a higher quality (reliability and validity) than conference papers. However, studies often failed to provide important information about procedures and implementation.ConclusionText mining of health-related content published on social media can serve as a novel and powerful tool for the automated, real-time, and remote monitoring of public health and for the surveillance and prediction of communicable diseases in particular. This tool can address limitations related to traditional surveillance methods, and it has the potential to supplement traditional methods for public health surveillance.

Dataset Information

Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.

Publications

Discovering Cohorts of Pregnant Women From Social Media for Safety Surveillance and Analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets