Dataset Information

Low-Shot Deep Learning of Diabetic Retinopathy With Potential Applications to Address Artificial Intelligence Bias in Retinal Diagnostics and Rare Ophthalmic Diseases.

ABSTRACT:

Importance

Recent studies have demonstrated the successful application of artificial intelligence (AI) for automated retinal disease diagnostics but have not addressed a fundamental challenge for deep learning systems: the current need for large, criterion standard-annotated retinal data sets for training. Low-shot learning algorithms, aiming to learn from a relatively low number of training data, may be beneficial for clinical situations involving rare retinal diseases or when addressing potential bias resulting from data that may not adequately represent certain groups for training, such as individuals older than 85 years.

Objective

To evaluate whether low-shot deep learning methods are beneficial when using small training data sets for automated retinal diagnostics.

Design, setting, and participants

This cross-sectional study, conducted from July 1, 2019, to June 21, 2020, compared different diabetic retinopathy classification algorithms, traditional and low-shot, for 2-class designations (diabetic retinopathy warranting referral vs not warranting referral). The public domain EyePACS data set was used, which originally included 88 692 fundi from 44 346 individuals. Statistical analysis was performed from February 1 to June 21, 2020.

Main outcomes and measures

The performance (95% CIs) of the various AI algorithms was measured via receiver operating curves and their area under the curve (AUC), precision recall curves, accuracy, and F1 score, evaluated for different training data sizes, ranging from 5120 to 10 samples per class.

Results

Deep learning algorithms, when trained with sufficiently large data sets (5120 samples per class), yielded comparable performance, with an AUC of 0.8330 (95% CI, 0.8140-0.8520) for a traditional approach (eg, fined-tuned ResNet), compared with low-shot methods (AUC, 0.8348 [95% CI, 0.8159-0.8537]) (using self-supervised Deep InfoMax [our method denoted as DIM]). However, when far fewer training images were available (n = 160), the traditional deep learning approach had an AUC decreasing to 0.6585 (95% CI, 0.6332-0.6838) and was outperformed by a low-shot method using self-supervision with an AUC of 0.7467 (95% CI, 0.7239-0.7695). At very low shots (n = 10), the traditional approach had performance close to chance, with an AUC of 0.5178 (95% CI, 0.4909-0.5447) compared with the best low-shot method (AUC, 0.5778 [95% CI, 0.5512-0.6044]).

Conclusions and relevance

These findings suggest the potential benefits of using low-shot methods for AI retinal diagnostics when a limited number of annotated training retinal images are available (eg, with rare ophthalmic diseases or when addressing potential AI bias).

SUBMITTER: Burlina P

PROVIDER: S-EPMC7489388 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundDiabetic retinopathy (DR) is a leading cause of blindness. Our objective was to evaluate the performance of an artificial intelligence (AI) system integrated into a handheld smartphone-based retinal camera for DR screening using a single retinal image per eye.MethodsImages were obtained from individuals with diabetes during a mass screening program for DR in Blumenau, Southern Brazil, conducted by trained operators. Automatic analysis was conducted using an AI system (EyerMaps™, Phelcom Technologies LLC, Boston, USA) with one macula-centered, 45-degree field of view retinal image per eye. The results were compared to the assessment by a retinal specialist, considered as the ground truth, using two images per eye. Patients with ungradable images were excluded from the analysis.ResultsA total of 686 individuals (average age 59.2 ± 13.3 years, 56.7% women, diabetes duration 12.1 ± 9.4 years) were included in the analysis. The rates of insulin use, daily glycemic monitoring, and systemic hypertension treatment were 68.4%, 70.2%, and 70.2%, respectively. Although 97.3% of patients were aware of the risk of blindness associated with diabetes, more than half of them underwent their first retinal examination during the event. The majority (82.5%) relied exclusively on the public health system. Approximately 43.4% of individuals were either illiterate or had not completed elementary school. DR classification based on the ground truth was as follows: absent or nonproliferative mild DR 86.9%, more than mild (mtm) DR 13.1%. The AI system achieved sensitivity, specificity, positive predictive value, and negative predictive value percentages (95% CI) for mtmDR as follows: 93.6% (87.8-97.2), 71.7% (67.8-75.4), 42.7% (39.3-46.2), and 98.0% (96.2-98.9), respectively. The area under the ROC curve was 86.4%.ConclusionThe portable retinal camera combined with AI demonstrated high sensitivity for DR screening using only one image per eye, offering a simpler protocol compared to the traditional approach of two images per eye. Simplifying the DR screening process could enhance adherence rates and overall program coverage.

Project description:Background: Artificial intelligence (AI) has shown promise in numerous experimental studies, particularly in skin cancer diagnostics. Translation of these findings into the clinic is the logical next step. This translation can only be successful if patients' concerns and questions are addressed suitably. We therefore conducted a survey to evaluate the patients' view of artificial intelligence in melanoma diagnostics in Germany, with a particular focus on patients with a history of melanoma. Participants and Methods: A web-based questionnaire was designed using LimeSurvey, sent by e-mail to university hospitals and melanoma support groups and advertised on social media. The anonymous questionnaire evaluated patients' expectations and concerns toward artificial intelligence in general as well as their attitudes toward different application scenarios. Descriptive analysis was performed with expression of categorical variables as percentages and 95% confidence intervals. Statistical tests were performed to investigate associations between sociodemographic data and selected items of the questionnaire. Results: 298 individuals (154 with a melanoma diagnosis, 143 without) responded to the questionnaire. About 94% [95% CI = 0.91-0.97] of respondents supported the use of artificial intelligence in medical approaches. 88% [95% CI = 0.85-0.92] would even make their own health data anonymously available for the further development of AI-based applications in medicine. Only 41% [95% CI = 0.35-0.46] of respondents were amenable to the use of artificial intelligence as stand-alone system, 94% [95% CI = 0.92-0.97] to its use as assistance system for physicians. In sub-group analyses, only minor differences were detectable. Respondents with a previous history of melanoma were more amenable to the use of AI applications for early detection even at home. They would prefer an application scenario where physician and AI classify the lesions independently. With respect to AI-based applications in medicine, patients were concerned about insufficient data protection, impersonality and susceptibility to errors, but expected faster, more precise and unbiased diagnostics, less diagnostic errors and support for physicians. Conclusions: The vast majority of participants exhibited a positive attitude toward the use of artificial intelligence in melanoma diagnostics, especially as an assistance system.

Project description:ImportanceThe development of artificial intelligence (AI)-based melanoma classifiers typically calls for large, centralized datasets, requiring hospitals to give away their patient data, which raises serious privacy concerns. To address this concern, decentralized federated learning has been proposed, where classifier development is distributed across hospitals.ObjectiveTo investigate whether a more privacy-preserving federated learning approach can achieve comparable diagnostic performance to a classical centralized (ie, single-model) and ensemble learning approach for AI-based melanoma diagnostics.Design, setting, and participantsThis multicentric, single-arm diagnostic study developed a federated model for melanoma-nevus classification using histopathological whole-slide images prospectively acquired at 6 German university hospitals between April 2021 and February 2023 and benchmarked it using both a holdout and an external test dataset. Data analysis was performed from February to April 2023.ExposuresAll whole-slide images were retrospectively analyzed by an AI-based classifier without influencing routine clinical care.Main outcomes and measuresThe area under the receiver operating characteristic curve (AUROC) served as the primary end point for evaluating the diagnostic performance. Secondary end points included balanced accuracy, sensitivity, and specificity.ResultsThe study included 1025 whole-slide images of clinically melanoma-suspicious skin lesions from 923 patients, consisting of 388 histopathologically confirmed invasive melanomas and 637 nevi. The median (range) age at diagnosis was 58 (18-95) years for the training set, 57 (18-93) years for the holdout test dataset, and 61 (18-95) years for the external test dataset; the median (range) Breslow thickness was 0.70 (0.10-34.00) mm, 0.70 (0.20-14.40) mm, and 0.80 (0.30-20.00) mm, respectively. The federated approach (0.8579; 95% CI, 0.7693-0.9299) performed significantly worse than the classical centralized approach (0.9024; 95% CI, 0.8379-0.9565) in terms of AUROC on a holdout test dataset (pairwise Wilcoxon signed-rank, P < .001) but performed significantly better (0.9126; 95% CI, 0.8810-0.9412) than the classical centralized approach (0.9045; 95% CI, 0.8701-0.9331) on an external test dataset (pairwise Wilcoxon signed-rank, P < .001). Notably, the federated approach performed significantly worse than the ensemble approach on both the holdout (0.8867; 95% CI, 0.8103-0.9481) and external test dataset (0.9227; 95% CI, 0.8941-0.9479).Conclusions and relevanceThe findings of this diagnostic study suggest that federated learning is a viable approach for the binary classification of invasive melanomas and nevi on a clinically representative distributed dataset. Federated learning can improve privacy protection in AI-based melanoma diagnostics while simultaneously promoting collaboration across institutions and countries. Moreover, it may have the potential to be extended to other image classification tasks in digital cancer histopathology and beyond.