Dataset Information

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

ABSTRACT:

Background

The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.

Methods

We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis.

Results

The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results.

Conclusions

This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.

SUBMITTER: Reason T

PROVIDER: S-EPMC10884375 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

Reason Tim T Benbow Emma E Langham Julia J Gimblett Andy A Klijn Sven L SL Malcolm Bill B

PharmacoEconomics - open 20240210 2

<h4>Background</h4>The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.<h4>Methods</h4>We considered four case studies i ...[more]

PMID: 38340277

Similar Datasets

Project description:BackgroundCurrent generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses.MethodsThe two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output.ResultsGPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%.ConclusionThis study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.

Project description:BackgroundAt a time when Internet and social media use is omnipresent among patients in their self-directed research about their medical or surgical needs, artificial intelligence (AI) large language models (LLMs) are on track to represent hallmark resources in this context.ObjectivesThe authors aim to explore and assess the performance of a novel AI LLM in answering questions posed by simulated patients interested in aesthetic breast plastic surgery procedures.MethodsA publicly available AI LLM was queried using simulated interactions from the perspective of patients interested in breast augmentation, mastopexy, and breast reduction. Questions posed were standardized and categorized under aesthetic needs inquiries and awareness of appropriate procedures; patient candidacy and indications; procedure safety and risks; procedure information, steps, and techniques; patient assessment; preparation for surgery; postprocedure instructions and recovery; and procedure cost and surgeon recommendations. Using standardized Likert scales ranging from 1 to 10, 4 expert breast plastic surgeons evaluated responses provided by AI. A postparticipation survey assessed expert evaluators' experience with LLM technology, perceived utility, and limitations.ResultsThe overall performance across all question categories, assessment criteria, and procedures examined was 7.3/10 ± 0.5. Overall accuracy of information shared was scored at 7.1/10 ± 0.5; comprehensiveness at 7.0/10 ± 0.6; objectivity at 7.5/10 ± 0.4; safety at 7.5/10 ± 0.4; communication clarity at 7.3/10 ± 0.2; and acknowledgment of limitations at 7.7/10 ± 0.2. With regards to performance on procedures examined, the model's overall score was 7.0/10 ± 0.8 for breast augmentation; 7.6/10 ± 0.5 for mastopexy; and 7.4/10 ± 0.5 for breast reduction. The score on breast implant-specific knowledge was 6.7/10 ± 0.6.ConclusionsAlbeit not without limitations, AI LLMs represent promising resources for patient guidance and patient education. The technology's machine learning capabilities may explain its improved performance efficiency.Level of evidence 4

Project description:With breakthroughs in Natural Language Processing and Artificial Intelligence (AI), the usage of Large Language Models (LLMs) in academic research has increased tremendously. Models such as Generative Pre-trained Transformer (GPT) are used by researchers in literature review, abstract screening, and manuscript drafting. However, these models also present the attendant challenge of providing ethically questionable scientific information. Our study provides a snapshot of global researchers' perception of current trends and future impacts of LLMs in research. Using a cross-sectional design, we surveyed 226 medical and paramedical researchers from 59 countries across 65 specialties, trained in the Global Clinical Scholars' Research Training certificate program of Harvard Medical School between 2020 and 2024. Majority (57.5%) of these participants practiced in an academic setting with a median of 7 (2,18) PubMed Indexed published articles. 198 respondents (87.6%) were aware of LLMs and those who were aware had higher number of publications (p < 0.001). 18.7% of the respondents who were aware (n = 37) had previously used LLMs in publications especially for grammatical errors and formatting (64.9%); however, most (40.5%) did not acknowledge its use in their papers. 50.8% of aware respondents (n = 95) predicted an overall positive future impact of LLMs while 32.6% were unsure of its scope. 52% of aware respondents (n = 102) believed that LLMs would have a major impact in areas such as grammatical errors and formatting (66.3%), revision and editing (57.2%), writing (57.2%) and literature review (54.2%). 58.1% of aware respondents were opined that journals should allow for use of AI in research and 78.3% believed that regulations should be put in place to avoid its abuse. Seeing the perception of researchers towards LLMs and the significant association between awareness of LLMs and number of published works, we emphasize the importance of developing comprehensive guidelines and ethical framework to govern the use of AI in academic research and address the current challenges.

Project description:BackgroundThe rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective.Design/methodsA literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within 'GloVe' and 'Word2Vec' word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health.ResultsOur primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication.ConclusionOur findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.

Dataset Information

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

Background

Methods

Results

Conclusions

Publications

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets