Dataset Information

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy.

ABSTRACT:

Objective

To examine the accuracy of artificial intelligence (AI) for the detection of breast cancer in mammography screening practice.

Design

Systematic review of test accuracy studies.

Data sources

Medline, Embase, Web of Science, and Cochrane Database of Systematic Reviews from 1 January 2010 to 17 May 2021.

Eligibility criteria

Studies reporting test accuracy of AI algorithms, alone or in combination with radiologists, to detect cancer in women's digital mammograms in screening practice, or in test sets. Reference standard was biopsy with histology or follow-up (for screen negative women). Outcomes included test accuracy and cancer type detected.

Study selection and synthesis

Two reviewers independently assessed articles for inclusion and assessed the methodological quality of included studies using the QUality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. A single reviewer extracted data, which were checked by a second reviewer. Narrative data synthesis was performed.

Results

Twelve studies totalling 131 822 screened women were included. No prospective studies measuring test accuracy of AI in screening practice were found. Studies were of poor methodological quality. Three retrospective studies compared AI systems with the clinical decisions of the original radiologist, including 79 910 women, of whom 1878 had screen detected cancer or interval cancer within 12 months of screening. Thirty four (94%) of 36 AI systems evaluated in these studies were less accurate than a single radiologist, and all were less accurate than consensus of two or more radiologists. Five smaller studies (1086 women, 520 cancers) at high risk of bias and low generalisability to the clinical context reported that all five evaluated AI systems (as standalone to replace radiologist or as a reader aid) were more accurate than a single radiologist reading a test set in the laboratory. In three studies, AI used for triage screened out 53%, 45%, and 50% of women at low risk but also 10%, 4%, and 0% of cancers detected by radiologists.

Conclusions

Current evidence for AI does not yet allow judgement of its accuracy in breast cancer screening programmes, and it is unclear where on the clinical pathway AI might be of most benefit. AI systems are not sufficiently specific to replace radiologist double reading in screening programmes. Promising results in smaller studies are not replicated in larger studies. Prospective studies are required to measure the effect of AI in clinical practice. Such studies will require clear stopping rules to ensure that AI does not reduce programme specificity.

Study registration

Protocol registered as PROSPERO CRD42020213590.

SUBMITTER: Freeman K

PROVIDER: S-EPMC8409323 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy.

Freeman Karoline K Geppert Julia J Stinton Chris C Todkill Daniel D Johnson Samantha S Clarke Aileen A Taylor-Phillips Sian S

BMJ (Clinical research ed.) 20210901

<h4>Objective</h4>To examine the accuracy of artificial intelligence (AI) for the detection of breast cancer in mammography screening practice.<h4>Design</h4>Systematic review of test accuracy studies.<h4>Data sources</h4>Medline, Embase, Web of Science, and Cochrane Database of Systematic Reviews from 1 January 2010 to 17 May 2021.<h4>Eligibility criteria</h4>Studies reporting test accuracy of AI algorithms, alone or in combination with radiologists, to detect cancer in women's digital mammogra ...[more]

PMID: 34470740

Similar Datasets

Project description:Retrospective studies on artificial intelligence (AI) in screening for diabetic retinopathy (DR) have shown promising results in addressing the mismatch between the capacity to implement DR screening and increasing DR incidence. This review sought to evaluate the diagnostic test accuracy (DTA) of AI in screening for referable diabetic retinopathy (RDR) in real-world settings. We searched CENTRAL, PubMed, CINAHL, Scopus, and Web of Science on 9 February 2023. We included prospective DTA studies assessing AI against trained human graders (HGs) in screening for RDR in patients with diabetes. Two reviewers independently extracted data and assessed methodological quality against QUADAS-2 criteria. We used the hierarchical summary receiver operating characteristics (HSROC) model to pool estimates of sensitivity and specificity and, forest plots and SROC plots to visually examine heterogeneity in accuracy estimates. From our initial search results of 3899 studies, we included 15 studies comprising 17 datasets. Meta-analyses revealed a sensitivity of 95.33% (95%CI: 90.60-100%) and specificity of 92.01% (95%CI: 87.61-96.42%) for patient-level analysis (10 datasets, N = 45,785) while, for the eye-level analysis, sensitivity was 91.24% (95%CI: 79.15-100%) and specificity, 93.90% (95%CI: 90.63-97.16%) (7 datasets, N = 15,390). Subgroup analyses did not provide variations in the diagnostic accuracy of country classification and DR classification criteria. However, a moderate increase was observed in diagnostic accuracy in the primary-level healthcare settings: sensitivity of 99.35% (95%CI: 96.85-100%), specificity of 93.72% (95%CI: 88.83-98.61%) and, a minimal decrease in the tertiary-level healthcare settings: sensitivity of 94.71% (95%CI: 89.00-100%), specificity of 90.88% (95%CI: 83.22-98.53%). Sensitivity analyses did not show any variations in studies that included diabetic macular edema in the RDR definition, nor studies with ≥3 HGs. This review provides evidence, for the first time from prospective studies, for the effectiveness of AI in screening for RDR in real-world settings. The results may serve to strengthen existing guidelines to improve current practices.

Project description:Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies (n = 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80-90%) and one with very poor sensitivity (< 60%) and excellent specificity (> 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions (n = 7) apart from ribs (n = 4; poor sensitivity / moderate specificity) and spine (n = 4; excellent sensitivity / poor specificity). Funded studies (n = 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies (n = 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings (p = 0.316) but specificity was significantly higher the latter group (p < 0.001). Sensitivity was significant lower in human unaided compared to human AI aided respectively stand-alone AI ratings (both p ≤ 0.001); specificity was higher in human unaided ratings compared to stand-alone AI (p < 0.001) and showed no significant differences AI aided ratings (p = 0.316). The study demonstrates good diagnostic accuracy across most CAAI-FDS and anatomical regions, with the highest performance achieved when used in conjunction with human assessment. Diagnostic accuracy appears lower for spine and rib fractures. The impact of industry funding on reported performance is small.

Project description:BackgroundArtificial Intelligence (AI) has been used to automate detection of retinal diseases from retinal images with great success, in particular for screening for diabetic retinopathy, a major complication of diabetes. Since persons with diabetes routinely receive retinal imaging to evaluate their diabetic retinopathy status, AI-based retinal imaging may have potential to be used as an opportunistic comprehensive screening for multiple systemic micro- and macro-vascular complications of diabetes.MethodsWe conducted a qualitative systematic review on published literature using AI on retina images to detect systemic diabetes complications. We searched three main databases: PubMed, Google Scholar, and Web of Science (January 1, 2000, to October 1, 2024). Research that used AI to evaluate the associations between retinal images and diabetes-associated complications, or research involving diabetes patients with retinal imaging and AI systems were included. Our primary focus was on articles related to AI, retinal images, and diabetes-associated complications. We evaluated each study for the robustness of the studies by development of the AI algorithm, size and quality of the training dataset, internal validation and external testing, and the performance. Quality assessments were employed to ensure the inclusion of high-quality studies, and data extraction was conducted systematically to gather pertinent information for analysis. This study has been registered on PROSPERO under the registration ID CRD42023493512.FindingsFrom a total of 337 abstracts, 38 studies were included. These studies covered a range of topics related to prediction of diabetes from pre-diabetes or non-diabeticindividuals (n = 4), diabetes related systemic risk factors (n = 10), detection of microvascular complications (n = 8) and detection of macrovascular complications (n = 17). Most studies (n = 32) utilized color fundus photographs (CFP) as retinal image modality, while others employed optical coherence tomography (OCT) (n = 6). The performance of the AI systems varied, with an AUC ranging from 0.676 to 0.971 in prediction or identification of different complications. Study designs included cross-sectional and cohort studies with sample sizes ranging from 100 to over 100,000 participants. Risk of bias was evaluated by using the Newcastle-Ottawa Scale and AXIS, with most studies scoring as low to moderate risk.InterpretationOur review highlights the potential for the use of AI algorithms applied to retina images, particularly CFP, to screen, predict, or diagnose the various microvascular and macrovascular complications of diabetes. However, we identified few studies with longitudinal data and a paucity of randomized control trials, reflecting a gap between the development of AI algorithms and real-world implementation and translational studies.FundingDr. Gavin Siew Wei TAN is supported by: 1. DYNAMO: Diabetes studY on Nephropathy And other Microvascular cOmplications II supported by National Medical Research Council (MOH-001327-03): data collection, analysis, trial design 2. Prognositc significance of novel multimodal imaging markers for diabetic retinopathy: towards improving the staging for diabetic retinopathy supported by NMRC Clinician Scientist Award (CSA)-Investigator (INV) (MOH-001047-00).

Dataset Information

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy.

Objective

Design

Data sources

Eligibility criteria

Study selection and synthesis

Results

Conclusions

Study registration

Publications

Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets