Dataset Information

Question-driven summarization of answers to consumer health questions.

ABSTRACT: Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset's unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.

SUBMITTER: Savery M

PROVIDER: S-EPMC7532186 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Question-driven summarization of answers to consumer health questions.

Savery Max M Abacha Asma Ben AB Gayen Soumya S Demner-Fushman Dina D

Scientific data 20201002 1

Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately ...[more]

PMID: 33009402

Similar Datasets

Project description:BackgroundAbout 6 million people search for health information on the Internet each day in the United States. Both patients and caregivers search for information about prescribed courses of treatments, unanswered questions after a visit to their providers, or diet and exercise regimens. Past literature has indicated potential challenges around quality in health information available on the Internet. However, diverse information exists on the Internet-ranging from government-initiated webpages to personal blog pages. Yet we do not fully understand the strengths and weaknesses of different types of information available on the Internet.ObjectiveThe objective of this research was to investigate the strengths and challenges of various types of health information available online and to suggest what information sources best fit various question types.MethodsWe collected questions posted to and the responses they received from an online diabetes community and classified them according to Rothwell's classification of question types (fact, policy, or value questions). We selected 60 questions (20 each of fact, policy, and value) and the replies the questions received from the community. We then searched for responses to the same questions using a search engine and recorded theResultsCommunity responses answered more questions than did search results overall. Search results were most effective in answering value questions and least effective in answering policy questions. Community responses answered questions across question types at an equivalent rate, but most answered policy questions and the least answered fact questions. Value questions were most answered by community responses, but some of these answers provided by the community were incorrect. Fact question search results were the most clinically valid.ConclusionsThe Internet is a prevalent source of health information for people. The information quality people encounter online can have a large impact on them. We present what kinds of questions people ask online and the advantages and disadvantages of various information sources in getting answers to those questions. This study contributes to addressing people's online health information needs.

Project description:Despite the wealth of mental-health information available online to consumers, research has shown that the mental-health information needs of consumers are not being met. This study contributes to that research by soliciting consumer questions directly, categorizing them, analyzing their form, and assessing the extent to which they can be answered from a trusted and vetted source of online information, namely the website of the US National Institute of Mental Health (NIMH). As an alternative to surveys and analyses of online activity, this study shows how consumer questions provide new insight into what consumers do not know and how they express their information needs. The study crowdsourced 100 consumer questions through Amazon Inc.'s Mechanical Turk. Categorization of the questions shows broad agreement with earlier studies in terms of the content of consumer questions. It also suggests that consumers' grasp of mental health issues may be low compared to other health topics. The majority of the questions (74%) were simple in form, with the remainder being multi-part, multifaceted or narrative. Even simple-form questions could, however, have complex interpretations. Fifty four questions were submitted to the search box at the NIMH website. For 32 questions, no answer could be found in the top one to three documents returned. Inadequacies in the search and retrieval technology deployed at websites account for some of the failure to find answers. The nature of consumer questions in mental health also plays a role. A question that has a false presupposition is less likely to have an answer in trusted and vetted sources of information. Consumer questions are also expressed with a degree of specificity that makes the retrieval of relevant information difficult. The significance of this study is that it shows what an analysis of consumer mental-health questions can tell us about consumer information needs and it provides new insight into the difficulties facing consumers looking for answers to their questions in online resources.

Dataset Information

Question-driven summarization of answers to consumer health questions.

Publications

Question-driven summarization of answers to consumer health questions.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets