Dataset Information

Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis.

ABSTRACT: Deep learning (DL) has the potential to transform medical diagnostics. However, the diagnostic accuracy of DL is uncertain. Our aim was to evaluate the diagnostic accuracy of DL algorithms to identify pathology in medical imaging. Searches were conducted in Medline and EMBASE up to January 2020. We identified 11,921 studies, of which 503 were included in the systematic review. Eighty-two studies in ophthalmology, 82 in breast disease and 115 in respiratory disease were included for meta-analysis. Two hundred twenty-four studies in other specialities were included for qualitative review. Peer-reviewed studies that reported on the diagnostic accuracy of DL algorithms to identify pathology using medical imaging were included. Primary outcomes were measures of diagnostic accuracy, study design and reporting standards in the literature. Estimates were pooled using random-effects meta-analysis. In ophthalmology, AUC's ranged between 0.933 and 1 for diagnosing diabetic retinopathy, age-related macular degeneration and glaucoma on retinal fundus photographs and optical coherence tomography. In respiratory imaging, AUC's ranged between 0.864 and 0.937 for diagnosing lung nodules or lung cancer on chest X-ray or CT scan. For breast imaging, AUC's ranged between 0.868 and 0.909 for diagnosing breast cancer on mammogram, ultrasound, MRI and digital breast tomosynthesis. Heterogeneity was high between studies and extensive variation in methodology, terminology and outcome measures was noted. This can lead to an overestimation of the diagnostic accuracy of DL algorithms on medical imaging. There is an immediate need for the development of artificial intelligence-specific EQUATOR guidelines, particularly STARD, in order to provide guidance around key issues in this field.

SUBMITTER: Aggarwal R

PROVIDER: S-EPMC8027892 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:ImportanceSystematic reviews of medical imaging diagnostic test accuracy (DTA) studies are affected by between-study heterogeneity due to a range of factors. Failure to appropriately assess the extent and causes of heterogeneity compromises the interpretability of systematic review findings.ObjectiveTo assess how heterogeneity has been examined in medical imaging DTA studies.Evidence reviewThe PubMed database was searched for systematic reviews of medical imaging DTA studies that performed a meta-analysis. The search was limited to the 40 journals with highest impact factor in the radiology, nuclear medicine, and medical imaging category in the InCites Journal Citation Reports of 2021 to reach a sample size of 200 to 300 included studies. Descriptive analysis was performed to characterize the imaging modality, target condition, type of meta-analysis model used, strategies for evaluating heterogeneity, and sources of heterogeneity identified. Multivariable logistic regression was performed to assess whether any factors were associated with at least 1 source of heterogeneity being identified in the included meta-analyses. Methodological quality evaluation was not performed. Data analysis occurred from October to December 2022.FindingsA total of 242 meta-analyses involving a median (range) of 987 (119-441 510) patients across a diverse range of disease categories and imaging modalities were included. The extent of heterogeneity was adequately described (ie, whether it was absent, low, moderate, or high) in 220 studies (91%) and was most commonly assessed using the I2 statistic (185 studies [76%]) and forest plots (181 studies [75%]). Heterogeneity was rated as moderate to high in 191 studies (79%). Of all included meta-analyses, 122 (50%) performed subgroup analysis and 87 (36%) performed meta-regression. Of the 242 studies assessed, 189 (78%) included 10 or more primary studies. Of these 189 studies, 60 (32%) did not perform meta-regression or subgroup analysis. Reasons for being unable to investigate sources of heterogeneity included inadequate reporting of primary study characteristics and a low number of included primary studies. Use of meta-regression was associated with identification of at least 1 source of variability (odds ratio, 1.90; 95% CI, 1.11-3.23; P = .02).Conclusions and relevanceIn this systematic review of assessment of heterogeneity in medical imaging DTA meta-analyses, most meta-analyses were impacted by a moderate to high level of heterogeneity, presenting interpretive challenges. These findings suggest that, despite the development and availability of more rigorous statistical models, heterogeneity appeared to be incomplete, inconsistently evaluated, or methodologically questionable in many cases, which lessened the interpretability of the analyses performed; comprehensive heterogeneity assessment should be addressed at the author level by improving personal familiarity with appropriate statistical methodology for assessing heterogeneity and involving biostatisticians and epidemiologists in study design, as well as at the editorial level, by mandating adherence to methodologic standards in primary DTA studies and DTA meta-analyses.

Project description:BackgroundCoronary artery disease (CAD) is a leading cause of death worldwide, and the diagnostic process comprises of invasive testing with coronary angiography and non-invasive imaging, in addition to history, clinical examination, and electrocardiography (ECG). A highly accurate assessment of CAD lies in perfusion imaging which is performed by myocardial perfusion scintigraphy (MPS) and magnetic resonance imaging (stress CMR). Recently deep learning has been increasingly applied on perfusion imaging for better understanding of the diagnosis, safety, and outcome of CAD.The aim of this review is to summarise the evidence behind deep learning applications in myocardial perfusion imaging.MethodsA systematic search was performed on MEDLINE and EMBASE databases, from database inception until September 29, 2020. This included all clinical studies focusing on deep learning applications and myocardial perfusion imaging, and excluded competition conference papers, simulation and animal studies, and studies which used perfusion imaging as a variable with different focus. This was followed by review of abstracts and full texts. A meta-analysis was performed on a subgroup of studies which looked at perfusion images classification. A summary receiver-operating curve (SROC) was used to compare the performance of different models, and area under the curve (AUC) was reported. Effect size, risk of bias and heterogeneity were tested.Results46 studies in total were identified, the majority were MPS studies (76%). The most common neural network was convolutional neural network (CNN) (41%). 13 studies (28%) looked at perfusion imaging classification using MPS, the pooled diagnostic accuracy showed AUC = 0.859. The summary receiver operating curve (SROC) comparison showed superior performance of CNN (AUC = 0.894) compared to MLP (AUC = 0.848). The funnel plot was asymmetrical, and the effect size was significantly different with p value < 0.001, indicating small studies effect and possible publication bias. There was no significant heterogeneity amongst studies according to Q test (p = 0.2184).ConclusionDeep learning has shown promise to improve myocardial perfusion imaging diagnostic accuracy, prediction of patients' events and safety. More research is required in clinical applications, to achieve better care for patients with known or suspected CAD.

Project description:BackgroundThe application of deep learning on medical imaging is growing in prevalence in the recent literature. One of the most studied areas is coronary artery disease (CAD). Imaging of coronary artery anatomy is fundamental, which has led to a high number of publications describing a variety of techniques. The aim of this systematic review is to review the evidence behind the accuracy of deep learning applications in coronary anatomy imaging.MethodsThe search for the relevant studies, which applied deep learning on coronary anatomy imaging, was performed in a systematic approach on MEDLINE and EMBASE databases, followed by reviewing of abstracts and full texts. The data from the final studies was retrieved using data extraction forms. A meta-analysis was performed on a subgroup of studies, which looked at fractional flow reserve (FFR) prediction. Heterogeneity was tested using tau2, I2 and Q tests. Finally, a risk of bias was performed using Quality Assessment of Diagnostic Accuracy Studies (QUADAS) approach.ResultsA total of 81 studies met the inclusion criteria. The most common imaging modality was coronary computed tomography angiography (CCTA) (58%) and the most common deep learning method was convolutional neural network (CNN) (52%). The majority of studies demonstrated good performance metrics. The most common outputs were focused on coronary artery segmentation, clinical outcome prediction, coronary calcium quantification and FFR prediction, and most studies reported area under the curve (AUC) of ≥80%. The pooled diagnostic odds ratio (DOR) derived from 8 studies looking at FFR prediction using CCTA was 12.5 using the Mantel-Haenszel (MH) method. There was no significant heterogeneity amongst studies according to Q test (P=0.2496).ConclusionsDeep learning has been used in many applications on coronary anatomy imaging, most of which are yet to be externally validated and prepared for clinical use. The performance of deep learning, especially CNN models, proved to be powerful and some applications have already translated into medical practice, such as computed tomography (CT)-FFR. These applications have the potential to translate technology into better care of CAD patients.

Project description:PurposeEarly clinical recognition of sepsis can be challenging. With the advancement of machine learning, promising real-time models to predict sepsis have emerged. We assessed their performance by carrying out a systematic review and meta-analysis.MethodsA systematic search was performed in PubMed, Embase.com and Scopus. Studies targeting sepsis, severe sepsis or septic shock in any hospital setting were eligible for inclusion. The index test was any supervised machine learning model for real-time prediction of these conditions. Quality of evidence was assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology, with a tailored Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist to evaluate risk of bias. Models with a reported area under the curve of the receiver operating characteristic (AUROC) metric were meta-analyzed to identify strongest contributors to model performance.ResultsAfter screening, a total of 28 papers were eligible for synthesis, from which 130 models were extracted. The majority of papers were developed in the intensive care unit (ICU, n = 15; 54%), followed by hospital wards (n = 7; 25%), the emergency department (ED, n = 4; 14%) and all of these settings (n = 2; 7%). For the prediction of sepsis, diagnostic test accuracy assessed by the AUROC ranged from 0.68-0.99 in the ICU, to 0.96-0.98 in-hospital and 0.87 to 0.97 in the ED. Varying sepsis definitions limit pooling of the performance across studies. Only three papers clinically implemented models with mixed results. In the multivariate analysis, temperature, lab values, and model type contributed most to model performance.ConclusionThis systematic review and meta-analysis show that on retrospective data, individual machine learning models can accurately predict sepsis onset ahead of time. Although they present alternatives to traditional scoring systems, between-study heterogeneity limits the assessment of pooled results. Systematic reporting and clinical implementation studies are needed to bridge the gap between bytes and bedside.

Dataset Information

Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets