Dataset Information

Text classification for assisting moderators in online health communities.

ABSTRACT:

Objectives

Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help.

Methods

We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on ?² statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard.

Results

Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts.

Discussion

We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges.

Conclusion

Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.

SUBMITTER: Huh J

PROVIDER: S-EPMC3874858 | biostudies-literature | 2013 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Text classification for assisting moderators in online health communities.

Huh Jina J Yetisgen-Yildiz Meliha M Pratt Wanda W

Journal of biomedical informatics 20130908 6

<h4>Objectives</h4>Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help.<h4>Methods</h4>We employed a binary classifier on WebMD's online diabetes comm ...[more]

PMID: 24025513

Similar Datasets

Project description:Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question 'how much data is required to have an adequately performing model?', we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.

Project description:BackgroundOnline health community (OHC) moderators help facilitate conversations and provide information to members. However, the necessity of the moderator in helping members achieve goals by providing the support they need remains unclear, with some prior research suggesting that moderation is unnecessary or even harmful for close-knit OHCs. Similarly, members' perceptions of moderator roles are underexplored. Starting January of 2013, WebMD moderators stopped working for WebMD communities. This event provided an opportunity for us to study the perceived role of moderators in OHCs.ObjectiveWe examine the OHC members' perception on OHC moderators by studying their reactions toward the departure of moderators in their communities. We also analyzed the relative posting activity on OHCs before and after the departure of moderators from the communities among all members and those who discussed moderators' departures.MethodsWe applied a mixed-methods approach to study the posts of all 55 moderated WebMD communities by querying the terms relating to discussions surrounding moderators' disappearance from the WebMD community. We performed open and axial coding and affinity diagramming to thematically analyze patients' reactions to the disappeared moderators. The number of posts and poster groups (members and moderators) were analyzed over time to understand posting patterns around moderators' departure.ResultsOf 821 posts retrieved under 95 threads, a total of 166 open codes were generated. The codes were then grouped into 2 main themes with 6 total subthemes. First, patients attempted to understand why moderators had left and what could be done to fill the void left by the missing moderators. During these discussions, the posts revealed that patients believed that moderators played critical roles in the communities by making the communities vibrant and healthy, finding solutions, and giving medical information. Some patients felt personally attached with moderators, expressing they would cease their community participation. On the other hand, patients also indicated that moderators were not useful or sometimes even harmful for peer interactions. The overall communities' posting activity, which was already in decline, showed no significant difference before and after the moderators' departure. In fact, the overall posting activities of the communities were declining well before the moderators' departure. These declining posting activities might be the reason why WebMD removed the moderators.ConclusionCompassionate moderators who provide medical expertise, control destructive member posts, and help answer questions can provide important support for patient engagement in OHCs. Moderators are in general received positively by community members and do not appear to interfere with peer interactions. Members are well aware of the possibility of misinformation spreading in OHCs. Further investigation into the attitudes of less vocal community members should be conducted.

Project description:BackgroundPatient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources.ObjectiveThe primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials.MethodsThis approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval.ResultsWe provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one.ConclusionsText from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they typically receive, as our results suggest that at most a third of breast cancer survivors' questions would be addressed by the materials currently provided to them.

Project description:BackgroundIntimate partner violence (IPV) is an underreported public health crisis primarily affecting women associated with severe health conditions and can lead to a high rate of homicide. Owing to the COVID-19 pandemic, more women with IPV experiences visited online health communities (OHCs) to seek help because of anonymity. However, little is known regarding whether their help requests were answered and whether the information provided was delivered in an appropriate manner. To understand the help-seeking information sought and given in OHCs, extraction of postings and linguistic features could be helpful to develop automated models to improve future help-seeking experiences.ObjectiveThe objective of this study was to examine the types and patterns (ie, communication styles) of the advice offered by OHC members and whether the information received from women matched their expressed needs in their initial postings.MethodsWe examined data from Reddit using data from subreddit community r/domesticviolence posts from November 14, 2020, through November 14, 2021, during the COVID-19 pandemic. We included posts from women aged ≥18 years who self-identified or described experiencing IPV and requested advice or help in this subreddit community. Posts from nonabused women and women aged <18 years, non-English posts, good news announcements, gratitude posts without any advice seeking, and posts related to advertisements were excluded. We developed a codebook and annotated the postings in an iterative manner. Initial posts were also quantified using Linguistic Inquiry and Word Count to categorize linguistic and posting features. Postings were then classified into 2 categories (ie, matched needs and unmatched needs) according to the types of help sought and received in OHCs to capture the help-seeking result. Nonparametric statistical analysis (ie, 2-tailed t test or Mann-Whitney U test) was used to compare the linguistic and posting features between matched and unmatched needs.ResultsOverall, 250 postings were included, and 200 (80%) posting response comments matched with the type of help requested in initial postings, with legal advice and IPV knowledge achieving the highest matching rate. Overall, 17 linguistic or posting features were found to be significantly different between the 2 groups (ie, matched help and unmatched help). Positive title sentiment and linguistic features in postings containing health and wellness wordings were associated with unmatched needs postings, whereas the other 14 features were associated with postings with matched needs.ConclusionsOHCs can extract the linguistic and posting features to understand the help-seeking result among women with IPV experiences. Features identified in this corpus reflected the differences found between the 2 groups. This is the first study that leveraged Linguistic Inquiry and Word Count to shed light on generating predictive features from unstructured text in OHCs, which could guide future algorithm development to detect help-seeking results within OHCs effectively.

Project description:BackgroundIn online mental health communities, the interactions among members can significantly reduce their psychological distress and enhance their mental well-being. The overall quality of support from others varies due to differences in people's capacities to help others. This results in some support seekers' needs being met, while others remain unresolved.ObjectiveThis study aimed to examine which characteristics of the comments posted to provide support can make support seekers feel better (ie, result in cognitive change).MethodsWe used signaling theory to model the factors affecting cognitive change and used consulting strategies from the offline, face-to-face psychological counseling process to construct 6 characteristics: intimacy, emotional polarity, the use of first-person words, the use of future-tense words, specificity, and language style. Through text mining and natural language processing (NLP) technology, we identified linguistic features in online text and conducted an empirical analysis using 12,868 online mental health support reply data items from Zhihu to verify the effectiveness of those features.ResultsThe findings showed that support comments are more likely to alter support seekers' cognitive processes if those comments have lower intimacy (βintimacy=-1.706, P<.001), higher positive emotional polarity (βemotional_polarity=.890, P<.001), lower specificity (βspecificity=-.018, P<.001), more first-person words (βfirst-person=.120, P<.001), more future- and present-tense words (βfuture-words=.301, P<.001), and fewer function words (βlinguistic_style=-.838, P<.001). The result is consistent with psychotherapists' psychotherapeutic strategy in offline counseling scenarios.ConclusionsOur research contributes to both theory and practice by proposing a model to reveal the factors that make support seekers feel better. The findings have significance for support providers. Additionally, our study offers pointers for managing and designing online communities for mental health.

Dataset Information

Text classification for assisting moderators in online health communities.

Objectives

Methods

Results

Discussion

Conclusion

Publications

Text classification for assisting moderators in online health communities.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets