Dataset Information

Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse.

ABSTRACT:

Background

Social media has become an established platform for individuals to discuss and debate various subjects, including vaccination. With growing conversations on the web and less than desired maternal vaccination uptake rates, these conversations could provide useful insights to inform future interventions. However, owing to the volume of web-based posts, manual annotation and analysis are difficult and time consuming. Automated processes for this type of analysis, such as natural language processing, have faced challenges in extracting complex stances such as attitudes toward vaccination from large amounts of text.

Objective

The aim of this study is to build upon recent advances in transposer-based machine learning methods and test whether transformer-based machine learning could be used as a tool to assess the stance expressed in social media posts toward vaccination during pregnancy.

Methods

A total of 16,604 tweets posted between November 1, 2018, and April 30, 2019, were selected using keyword searches related to maternal vaccination. After excluding irrelevant tweets, the remaining tweets were coded by 3 individual researchers into the categories Promotional, Discouraging, Ambiguous, and Neutral or No Stance. After creating a final data set of 2722 unique tweets, multiple machine learning techniques were trained on a part of this data set and then tested and compared with the human annotators.

Results

We found the accuracy of the machine learning techniques to be 81.8% (F score=0.78) compared with the agreed score among the 3 annotators. For comparison, the accuracies of the individual annotators compared with the final score were 83.3%, 77.9%, and 77.5%.

Conclusions

This study demonstrates that we are able to achieve close to the same accuracy in categorizing tweets using our machine learning models as could be expected from a single human coder. The potential to use this automated process, which is reliable and accurate, could free valuable time and resources for conducting this analysis, in addition to informing potentially effective and necessary interventions.

SUBMITTER: Kummervold PE

PROVIDER: S-EPMC8538052 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse.

Kummervold Per E PE Martin Sam S Dada Sara S Kilich Eliz E Denny Chermain C Paterson Pauline P Larson Heidi J HJ

JMIR medical informatics 20211008 10

<h4>Background</h4>Social media has become an established platform for individuals to discuss and debate various subjects, including vaccination. With growing conversations on the web and less than desired maternal vaccination uptake rates, these conversations could provide useful insights to inform future interventions. However, owing to the volume of web-based posts, manual annotation and analysis are difficult and time consuming. Automated processes for this type of analysis, such as natural ...[more]

PMID: 34623312

Similar Datasets

Project description:BackgroundThe unprecedented speed of COVID-19 vaccine development and approval has raised public concern about its safety. However, studies on public discourses and opinions on social media focusing on adverse events (AEs) related to COVID-19 vaccine are rare.ObjectiveThis study aimed to analyze Korean tweets about COVID-19 vaccines (Pfizer, Moderna, AstraZeneca, Janssen, and Novavax) after the vaccine rollout, explore the topics and sentiments of tweets regarding COVID-19 vaccines, and examine their changes over time. We also analyzed topics and sentiments focused on AEs related to vaccination using only tweets with terms about AEs.MethodsWe devised a sophisticated methodology consisting of 5 steps: keyword search on Twitter, data collection, data preprocessing, data analysis, and result visualization. We used the Twitter Representational State Transfer application programming interface for data collection. A total of 1,659,158 tweets were collected from February 1, 2021, to March 31, 2022. Finally, 165,984 data points were analyzed after excluding retweets, news, official announcements, advertisements, duplicates, and tweets with <2 words. We applied a variety of preprocessing techniques that are suitable for the Korean language. We ran a suite of analyses using various Python packages, such as latent Dirichlet allocation, hierarchical latent Dirichlet allocation, and sentiment analysis.ResultsThe topics related to COVID-19 vaccines have a very large spectrum, including vaccine-related AEs, emotional reactions to vaccination, vaccine development and supply, and government vaccination policies. Among them, the top major topic was AEs related to COVID-19 vaccination. The AEs ranged from the adverse reactions listed in the safety profile (eg, myalgia, fever, fatigue, injection site pain, myocarditis or pericarditis, and thrombosis) to unlisted reactions (eg, irregular menstruation, changes in appetite and sleep, leukemia, and deaths). Our results showed a notable difference in the topics for each vaccine brand. The topics pertaining to the Pfizer vaccine mainly mentioned AEs. Negative public opinion has prevailed since the early stages of vaccination. In the sentiment analysis based on vaccine brand, the topics related to the Pfizer vaccine expressed the strongest negative sentiment.ConclusionsConsidering the discrepancy between academic evidence and public opinions related to COVID-19 vaccination, the government should provide accurate information and education. Furthermore, our study suggests the need for management to correct the misinformation related to vaccine-related AEs, especially those affecting negative sentiments. This study provides valuable insights into the public discourses and opinions regarding COVID-19 vaccination.

Project description:BackgroundDementia is a global public health priority due to rapid growth of the aging population. As China has the world's largest population with dementia, this debilitating disease has created tremendous challenges for older adults, family caregivers, and health care systems on the mainland nationwide. However, public awareness and knowledge of the disease remain limited in Chinese society.ObjectiveThis study examines online public discourse and sentiment toward dementia among the Chinese public on a leading Chinese social media platform Weibo. Specifically, this study aims to (1) assess and examine public discourse and sentiment toward dementia among the Chinese public, (2) determine the extent to which dementia-related discourse and sentiment vary among different user groups (ie, government, journalists/news media, scientists/experts, and the general public), and (3) characterize temporal trends in public discourse and sentiment toward dementia among different user groups in China over the past decade.MethodsIn total, 983,039 original dementia-related posts published by 347,599 unique users between 2010 and 2021, together with their user information, were analyzed. Machine learning analytical techniques, including topic modeling, sentiment analysis, and semantic network analyses, were used to identify salient themes/topics and their variations across different user groups (ie, government, journalists/news media, scientists/experts, and the general public).ResultsTopic modeling results revealed that symptoms, prevention, and social support are the most prevalent dementia-related themes on Weibo. Posts about dementia policy/advocacy have been increasing in volume since 2018. Raising awareness is the least discussed topic over time. Sentiment analysis indicated that Weibo users generally attach negative attitudes/emotions to dementia, with the general public holding a more negative attitude than other user groups.ConclusionsOverall, dementia has received greater public attention on social media since 2018. In particular, discussions related to dementia advocacy and policy are gaining momentum in China. However, disparaging language is still used to describe dementia in China; therefore, a nationwide initiative is needed to alter the public discourse on dementia. The results contribute to previous research by providing a macrolevel understanding of the Chinese public's discourse and attitudes toward dementia, which is essential for building national education and policy initiatives to create a dementia-friendly society. Our findings indicate that dementia is associated with negative sentiments, and symptoms and prevention dominate public discourse. The development of strategies to address unfavorable perceptions of dementia requires policy and public health attention. The results further reveal that an urgent need exists to increase public knowledge about dementia. Social media platforms potentially could be leveraged for future dementia education interventions to increase dementia awareness and promote positive attitudes.

Project description:BackgroundAs one of the serious public health issues, vaccination refusal has been attracting more and more attention, especially for newly approved human papillomavirus (HPV) vaccines. Understanding public opinion towards HPV vaccines, especially concerns on social media, is of significant importance for HPV vaccination promotion.MethodsIn this study, we leveraged a hierarchical machine learning based sentiment analysis system to extract public opinions towards HPV vaccines from Twitter. English tweets containing HPV vaccines-related keywords were collected from November 2, 2015 to March 28, 2016. Manual annotation was done to evaluate the performance of the system on the unannotated tweets corpus. Followed time series analysis was applied to this corpus to track the trends of machine-deduced sentiments and their associations with different days of the week.ResultsThe evaluation of the unannotated tweets corpus showed that the micro-averaging F scores have reached 0.786. The learning system deduced the sentiment labels for 184,214 tweets in the collected unannotated tweets corpus. Time series analysis identified a coincidence between mainstream outcome and Twitter contents. A weak trend was found for "Negative" tweets that decreased firstly and began to increase later; an opposite trend was identified for "Positive" tweets. Tweets that contain the worries on efficacy for HPV vaccines showed a relative significant decreasing trend. Strong associations were found between some sentiments ("Positive", "Negative", "Negative-Safety" and "Negative-Others") with different days of the week.ConclusionsOur efforts on sentiment analysis for newly approved HPV vaccines provide us an automatic and instant way to extract public opinion and understand the concerns on Twitter. Our approaches can provide a feedback to public health professionals to monitor online public response, examine the effectiveness of their HPV vaccination promotion strategies and adjust their promotion plans.

Project description:BackgroundSince COVID-19 vaccines became broadly available to the adult population, sharp divergences in uptake have emerged along partisan lines. Researchers have indicated a polarized social media presence contributing to the spread of mis- or disinformation as being responsible for these growing partisan gaps in uptake.ObjectiveThe major aim of this study was to investigate the role of influential actors in the context of the community structures and discourse related to COVID-19 vaccine conversations on Twitter that emerged prior to the vaccine rollout to the general population and discuss implications for vaccine promotion and policy.MethodsWe collected tweets on COVID-19 between July 1, 2020, and July 31, 2020, a time when attitudes toward the vaccines were forming but before the vaccines were widely available to the public. Using network analysis, we identified different naturally emerging Twitter communities based on their internal information sharing. A PageRank algorithm was used to quantitively measure the level of "influentialness" of Twitter accounts and identifying the "influencers," followed by coding them into different actor categories. Inductive coding was conducted to describe discourses shared in each of the 7 communities.ResultsTwitter vaccine conversations were highly polarized, with different actors occupying separate "clusters." The antivaccine cluster was the most densely connected group. Among the 100 most influential actors, medical experts were outnumbered both by partisan actors and by activist vaccine skeptics or conspiracy theorists. Scientists and medical actors were largely absent from the conservative network, and antivaccine sentiment was especially salient among actors on the political right. Conversations related to COVID-19 vaccines were highly polarized along partisan lines, with "trust" in vaccines being manipulated to the political advantage of partisan actors.ConclusionsThese findings are informative for designing improved vaccine information communication strategies to be delivered on social media especially by incorporating influential actors. Although polarization and echo chamber effect are not new in political conversations in social media, it was concerning to observe these in health conversations on COVID-19 vaccines during the vaccine development process.

Project description:BackgroundSocial media serves as a vast repository of data, offering insights into public perceptions and emotions surrounding significant societal issues. Amid the COVID-19 pandemic, long COVID (formally known as post-COVID-19 condition) has emerged as a chronic health condition, profoundly impacting numerous lives and livelihoods. Given the dynamic nature of long COVID and our evolving understanding of it, effectively capturing people's sentiments and perceptions through social media becomes increasingly crucial. By harnessing the wealth of data available on social platforms, we can better track the evolving narrative surrounding long COVID and the collective efforts to address this pressing issue.ObjectiveThis study aimed to investigate people's perceptions and sentiments around long COVID in Canada, the United States, and Europe, by analyzing English-language tweets from these regions using advanced topic modeling and sentiment analysis techniques. Understanding regional differences in public discourse can inform tailored public health strategies.MethodsWe analyzed long COVID-related tweets from 2021. Contextualized topic modeling was used to capture word meanings in context, providing coherent and semantically meaningful topics. Sentiment analysis was conducted in a zero-shot manner using Llama 2, a large language model, to classify tweets into positive, negative, or neutral sentiments. The results were interpreted in collaboration with public health experts, comparing the timelines of topics discussed across the 3 regions. This dual approach enabled a comprehensive understanding of the public discourse surrounding long COVID. We used metrics such as normalized pointwise mutual information for coherence and topic diversity for diversity to ensure robust topic modeling results.ResultsTopic modeling identified five main topics: (1) long COVID in people including children in the context of vaccination, (2) duration and suffering associated with long COVID, (3) persistent symptoms of long COVID, (4) the need for research on long COVID treatment, and (5) measuring long COVID symptoms. Significant concern was noted across all regions about the duration and suffering associated with long COVID, along with consistent discussions on persistent symptoms and calls for more research and better treatments. In particular, the topic of persistent symptoms was highly prevalent, reflecting ongoing challenges faced by individuals with long COVID. Sentiment analysis showed a mix of positive and negative sentiments, fluctuating with significant events and news related to long COVID.ConclusionsOur study combines natural language processing techniques, including contextualized topic modeling and sentiment analysis, along with domain expert input, to provide detailed insights into public health monitoring and intervention. These findings highlight the importance of tracking public discourse on long COVID to inform public health strategies, address misinformation, and provide support to affected individuals. The use of social media analysis in understanding public health issues is underscored, emphasizing the role of emerging technologies in enhancing public health responses.

Project description:BackgroundVaccination is a cornerstone of the prevention of communicable infectious diseases; however, vaccines have traditionally met with public fear and hesitancy, and COVID-19 vaccines are no exception. Social media use has been demonstrated to play a role in the low acceptance of vaccines.ObjectiveThe aim of this study is to identify the topics and sentiments in the public COVID-19 vaccine-related discussion on social media and discern the salient changes in topics and sentiments over time to better understand the public perceptions, concerns, and emotions that may influence the achievement of herd immunity goals.MethodsTweets were downloaded from a large-scale COVID-19 Twitter chatter data set from March 11, 2020, the day the World Health Organization declared COVID-19 a pandemic, to January 31, 2021. We used R software to clean the tweets and retain tweets that contained the keywords vaccination, vaccinations, vaccine, vaccines, immunization, vaccinate, and vaccinated. The final data set included in the analysis consisted of 1,499,421 unique tweets from 583,499 different users. We used R to perform latent Dirichlet allocation for topic modeling as well as sentiment and emotion analysis using the National Research Council of Canada Emotion Lexicon.ResultsTopic modeling of tweets related to COVID-19 vaccines yielded 16 topics, which were grouped into 5 overarching themes. Opinions about vaccination (227,840/1,499,421 tweets, 15.2%) was the most tweeted topic and remained a highly discussed topic during the majority of the period of our examination. Vaccine progress around the world became the most discussed topic around August 11, 2020, when Russia approved the world's first COVID-19 vaccine. With the advancement of vaccine administration, the topic of instruction on getting vaccines gradually became more salient and became the most discussed topic after the first week of January 2021. Weekly mean sentiment scores showed that despite fluctuations, the sentiment was increasingly positive in general. Emotion analysis further showed that trust was the most predominant emotion, followed by anticipation, fear, sadness, etc. The trust emotion reached its peak on November 9, 2020, when Pfizer announced that its vaccine is 90% effective.ConclusionsPublic COVID-19 vaccine-related discussion on Twitter was largely driven by major events about COVID-19 vaccines and mirrored the active news topics in mainstream media. The discussion also demonstrated a global perspective. The increasingly positive sentiment around COVID-19 vaccines and the dominant emotion of trust shown in the social media discussion may imply higher acceptance of COVID-19 vaccines compared with previous vaccines.

Project description:BackgroundSocial media is a rich source where we can learn about people's reactions to social issues. As COVID-19 has impacted people's lives, it is essential to capture how people react to public health interventions and understand their concerns.ObjectiveWe aim to investigate people's reactions and concerns about COVID-19 in North America, especially in Canada.MethodsWe analyzed COVID-19-related tweets using topic modeling and aspect-based sentiment analysis (ABSA), and interpreted the results with public health experts. To generate insights on the effectiveness of specific public health interventions for COVID-19, we compared timelines of topics discussed with the timing of implementation of interventions, synergistically including information on people's sentiment about COVID-19-related aspects in our analysis. In addition, to further investigate anti-Asian racism, we compared timelines of sentiments for Asians and Canadians.ResultsTopic modeling identified 20 topics, and public health experts provided interpretations of the topics based on top-ranked words and representative tweets for each topic. The interpretation and timeline analysis showed that the discovered topics and their trend are highly related to public health promotions and interventions such as physical distancing, border restrictions, handwashing, staying home, and face coverings. After training the data using ABSA with human-in-the-loop, we obtained 545 aspect terms (eg, "vaccines," "economy," and "masks") and 60 opinion terms such as "infectious" (negative) and "professional" (positive), which were used for inference of sentiments of 20 key aspects selected by public health experts. The results showed negative sentiments related to the overall outbreak, misinformation and Asians, and positive sentiments related to physical distancing.ConclusionsAnalyses using natural language processing techniques with domain expert involvement can produce useful information for public health. This study is the first to analyze COVID-19-related tweets in Canada in comparison with tweets in the United States by using topic modeling and human-in-the-loop domain-specific ABSA. This kind of information could help public health agencies to understand public concerns as well as what public health messages are resonating in our populations who use Twitter, which can be helpful for public health agencies when designing a policy for new interventions.

Dataset Information

Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse.

Background

Objective

Methods

Results

Conclusions

Publications

Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets