Unknown

Dataset Information

0

CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.


ABSTRACT:

Background

Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time.

Methods

This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics.

Results

Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%.

Conclusion

CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.

SUBMITTER: Raza S 

PROVIDER: S-EPMC9160513 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice.

Raza Shaina S   Schwartz Brian B   Rosella Laura C LC  

BMC bioinformatics 20220602 1


<h4>Background</h4>Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time.<h4>Methods</h4>This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datas  ...[more]

Similar Datasets

| S-EPMC11491595 | biostudies-literature
| S-EPMC11663219 | biostudies-literature
| S-EPMC8041998 | biostudies-literature
| S-EPMC10102790 | biostudies-literature
| S-EPMC9079685 | biostudies-literature
| S-EPMC10991373 | biostudies-literature
| S-EPMC4572360 | biostudies-literature
| S-EPMC4307891 | biostudies-literature
| S-EPMC9344839 | biostudies-literature
| S-EPMC5857288 | biostudies-literature