Dataset Information

Human-annotated dataset for social media sentiment analysis for Albanian language

ABSTRACT: Social media was a heavily used platform by people in different countries to express their opinions about different crises, especially during the Covid-19 pandemics. This dataset is created through collecting people's comments in the news items on the official Facebook site of the National Institute of Public Health of Kosovo. The dataset contains a total of 10,132 comments that are human-annotated in the Albanian language as a low-resource language. The dataset was collected from March 12, 2020, and this coincides with the emergence of the first confirmed Covid-19 case in Kosovo until August 31, 2020, when the second wave started. Due to the scarcity of labeled data for low-resource languages, the dataset can be used by the research community in the field of machine learning, information retrieval, affective computing, as well as by the public agencies and decision makers.

SUBMITTER: Kadriu F

PROVIDER: S-EPMC9272335 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:ObjectiveTo examine current vaccine sentiment on social media by constructing and analyzing semantic networks of vaccine information from highly shared websites of Twitter users in the United States; and to assist public health communication of vaccines.BackgroundVaccine hesitancy continues to contribute to suboptimal vaccination coverage in the United States, posing significant risk of disease outbreaks, yet remains poorly understood.MethodsWe constructed semantic networks of vaccine information from internet articles shared by Twitter users in the United States. We analyzed resulting network topology, compared semantic differences, and identified the most salient concepts within networks expressing positive, negative, and neutral vaccine sentiment.ResultsThe semantic network of positive vaccine sentiment demonstrated greater cohesiveness in discourse compared to the larger, less-connected network of negative vaccine sentiment. The positive sentiment network centered around parents and focused on communicating health risks and benefits, highlighting medical concepts such as measles, autism, HPV vaccine, vaccine-autism link, meningococcal disease, and MMR vaccine. In contrast, the negative network centered around children and focused on organizational bodies such as CDC, vaccine industry, doctors, mainstream media, pharmaceutical companies, and United States. The prevalence of negative vaccine sentiment was demonstrated through diverse messaging, framed around skepticism and distrust of government organizations that communicate scientific evidence supporting positive vaccine benefits.ConclusionSemantic network analysis of vaccine sentiment in online social media can enhance understanding of the scope and variability of current attitudes and beliefs toward vaccines. Our study synthesizes quantitative and qualitative evidence from an interdisciplinary approach to better understand complex drivers of vaccine hesitancy for public health communication, to improve vaccine confidence and vaccination coverage in the United States.

Project description:BackgroundAlthough vaccination rates are above the threshold for herd immunity in South Korea, a growing number of parents have expressed concerns about the safety of vaccines. It is important to understand these concerns so that we can maintain high vaccination rates.ObjectiveThe aim of this study was to develop a childhood vaccination ontology to serve as a framework for collecting and analyzing social data on childhood vaccination and to use this ontology for identifying concerns about and sentiments toward childhood vaccination from social data.MethodsThe domain and scope of the ontology were determined by developing competency questions. We checked if existing ontologies and conceptual frameworks related to vaccination can be reused for the childhood vaccination ontology. Terms were collected from clinical practice guidelines, research papers, and posts on social media platforms. Class concepts were extracted from these terms. A class hierarchy was developed using a top-down approach. The ontology was evaluated in terms of description logics, face and content validity, and coverage. In total, 40,359 Korean posts on childhood vaccination were collected from 27 social media channels between January and December 2015. Vaccination issues were identified and classified using the second-level class concepts of the ontology. The sentiments were classified in 3 ways: positive, negative or neutral. Posts were analyzed using frequency, trend, logistic regression, and association rules.ResultsOur childhood vaccination ontology comprised 9 superclasses with 137 subclasses and 431 synonyms for class, attribute, and value concepts. Parent's health belief appeared in 53.21% (15,709/29,521) of posts and positive sentiments appeared in 64.08% (17,454/27,236) of posts. Trends in sentiments toward vaccination were affected by news about vaccinations. Posts with parents' health belief, vaccination availability, and vaccination policy were associated with positive sentiments, whereas posts with experience of vaccine adverse events were associated with negative sentiments.ConclusionsThe childhood vaccination ontology developed in this study was useful for collecting and analyzing social data on childhood vaccination. We expect that practitioners and researchers in the field of childhood vaccination could use our ontology to identify concerns about and sentiments toward childhood vaccination from social data.

Project description:BackgroundEndometriosis is a debilitating and difficult-to-diagnose gynecological disease. Owing to limited information and awareness, women often rely on social media platforms as a support system to engage in discussions regarding their disease-related concerns.ObjectiveThis study aimed to apply computational techniques to social media posts to identify discussion topics about endometriosis and to identify themes that require more attention from health care professionals and researchers. We also aimed to explore whether, amid the challenging nature of the disease, there are themes within the endometriosis community that gather posts with positive sentiments.MethodsWe retrospectively extracted posts from the subreddits r/Endo and r/endometriosis from January 2011 to April 2022. We analyzed 45,693 Reddit posts using sentiment analysis and topic modeling-based methods in machine learning.ResultsSince 2011, the number of posts and comments has increased steadily. The posts were categorized into 11 categories, and the highest number of posts were related to either asking for information (Question); sharing the experiences (Rant/Vent); or diagnosing and treating endometriosis, especially surgery (Surgery related). Sentiment analysis revealed that 92.09% (42,077/45,693) of posts were associated with negative sentiments, only 2.3% (1053/45,693) expressed positive feelings, and there were no categories with more positive than negative posts. Topic modeling revealed 27 major topics, and the most popular topics were Surgery, Questions/Advice, Diagnosis, and Pain. The Survey/Research topic, which brought together most research-related posts, was the last in terms of posts.ConclusionsOur study shows that posts on social media platforms can provide insights into the concerns of women with endometriosis symptoms. The analysis of the posts confirmed that women with endometriosis have to face negative emotions and pain daily. The large number of posts related to asking questions shows that women do not receive sufficient information from physicians and need community support to cope with the disease. Health care professionals should pay more attention to the symptoms and diagnosis of endometriosis, discuss these topics with patients to reduce their dissatisfaction with doctors, and contribute more to the overall well-being of women with endometriosis. Researchers should also become more involved in social media and share new science-based knowledge regarding endometriosis.

Dataset Information

Human-annotated dataset for social media sentiment analysis for Albanian language

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets