Dataset Information

Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis.

ABSTRACT:

Background

The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.

Methods

In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.

Results

We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.

Conclusions

We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.

SUBMITTER: Li WT

PROVIDER: S-EPMC7522928 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BACKGROUND:Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. OBJECTIVE:The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. METHODS:Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. RESULTS:The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. CONCLUSIONS:We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

Project description:COVID-19-related acute respiratory distress syndrome (CARDS) has been suggested to differ from the typical ARDS. While distinct phenotypes of ARDS have been identified through latent class analysis (LCA), it is unclear whether such phenotypes exist for CARDS and how they affect clinical outcomes. To address this question, we conducted a systematic review of the current evidence.We searched several, including PubMed, EBSCO Host, and Web of Science, from inception to July 1, 2022. Our exposure and outcome of interest were different CARDS phenotypes identified and their associated outcomes, such as 28-day, 90-day, 180-day mortality, ventilator-free days, and other relevant outcomes.We identified four studies comprising a total of 1776 CARDS patients.Of the four studies, three used LCA to identify subphenotypes (SPs) of CARDS. One study based on longitudinal data identified two SPs, with SP2 associated with worse ventilation and mechanical parameters than SP1. The other two studies based on baseline data also identified two SPs, with SP2 and SP1 were associated with hyperinflammatory and hypoinflammatory CARDS, respectively. The fourth study identified three SPs primarily stratified by comorbidities using multifactorial analysis.All studies identified a subphenotype associated with poorer outcomes, including mortality, ventilator-free days, multiple-organ injury, and pulmonary embolism. Two studies reported differential responses to corticosteroids among the SPs, with improved mortality in the hyperinflammatory and worse in the hypoinflammatory SPs.Overall, our review highlights the importance of phenotyping in understanding CARDS and its impact on disease management and prognostication. However, a consensus approach to phenotyping is necessary to ensure consistency and comparability across studies. We recommend that randomized clinical trials stratified by phenotype should only be initiated after such consensus is reached.Short titleCOVID-19 ARDS subphenotypes and outcomes.

Project description:BackgroundNormal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems.ObjectiveThis study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest.MethodsThis systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group.ResultsIn the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models.ConclusionsThis SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.

Project description:BackgroundSARS-CoV-2, the novel coronavirus responsible for COVID-19, has caused havoc worldwide, with patients presenting a spectrum of complications that have pushed health care experts to explore new technological solutions and treatment plans. Artificial Intelligence (AI)-based technologies have played a substantial role in solving complex problems, and several organizations have been swift to adopt and customize these technologies in response to the challenges posed by the COVID-19 pandemic.ObjectiveThe objective of this study was to conduct a systematic review of the literature on the role of AI as a comprehensive and decisive technology to fight the COVID-19 crisis in the fields of epidemiology, diagnosis, and disease progression.MethodsA systematic search of PubMed, Web of Science, and CINAHL databases was performed according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines to identify all potentially relevant studies published and made available online between December 1, 2019, and June 27, 2020. The search syntax was built using keywords specific to COVID-19 and AI.ResultsThe search strategy resulted in 419 articles published and made available online during the aforementioned period. Of these, 130 publications were selected for further analyses. These publications were classified into 3 themes based on AI applications employed to combat the COVID-19 crisis: Computational Epidemiology, Early Detection and Diagnosis, and Disease Progression. Of the 130 studies, 71 (54.6%) focused on predicting the COVID-19 outbreak, the impact of containment policies, and potential drug discoveries, which were classified under the Computational Epidemiology theme. Next, 40 of 130 (30.8%) studies that applied AI techniques to detect COVID-19 by using patients' radiological images or laboratory test results were classified under the Early Detection and Diagnosis theme. Finally, 19 of the 130 studies (14.6%) that focused on predicting disease progression, outcomes (ie, recovery and mortality), length of hospital stay, and number of days spent in the intensive care unit for patients with COVID-19 were classified under the Disease Progression theme.ConclusionsIn this systematic review, we assembled studies in the current COVID-19 literature that utilized AI-based methods to provide insights into different COVID-19 themes. Our findings highlight important variables, data types, and available COVID-19 resources that can assist in facilitating clinical and translational research.