Unknown

Dataset Information

0

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.


ABSTRACT:

Background

The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for cancer patients and healthcare providers.

Materials and methods

We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to four domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30th, 2023. Two reviewers evaluated the answers independently.

Results

ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (p <0.0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (p <0.0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (p = 0.03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (p = 0.04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (p = 0.02).

Conclusion

ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all three LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.

SUBMITTER: Iannantuono GM 

PROVIDER: S-EPMC10705618 | biostudies-literature | 2023 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.

Iannantuono Giovanni Maria GM   Bracken-Clarke Dara D   Karzai Fatima F   Choo-Wosoba Hyoyoung H   Gulley James L JL   Floudas Charalampos S CS  

medRxiv : the preprint server for health sciences 20231031


<h4>Background</h4>The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for cancer patients and healthcare providers.<h4>Materials and methods</h4>We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to four domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis  ...[more]

Similar Datasets

| S-EPMC11306009 | biostudies-literature
| S-EPMC11462564 | biostudies-literature
| S-EPMC11922739 | biostudies-literature
| S-EPMC11339526 | biostudies-literature
| S-EPMC10988356 | biostudies-literature
| S-EPMC11923074 | biostudies-literature
| S-EPMC11224745 | biostudies-literature
| S-EPMC10252924 | biostudies-literature