Dataset Information

Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms.

ABSTRACT:

Purpose

To compare performance of independently developed deep learning algorithms for detecting glaucoma from fundus photographs and to evaluate strategies for incorporating new data into models.

Methods

Two fundus photograph datasets from the Diagnostic Innovations in Glaucoma Study/African Descent and Glaucoma Evaluation Study and Matsue Red Cross Hospital were used to independently develop deep learning algorithms for detection of glaucoma at the University of California, San Diego, and the University of Tokyo. We compared three versions of the University of California, San Diego, and University of Tokyo models: original (no retraining), sequential (retraining only on new data), and combined (training on combined data). Independent datasets were used to test the algorithms.

Results

The original University of California, San Diego and University of Tokyo models performed similarly (area under the receiver operating characteristic curve = 0.96 and 0.97, respectively) for detection of glaucoma in the Matsue Red Cross Hospital dataset, but not the Diagnostic Innovations in Glaucoma Study/African Descent and Glaucoma Evaluation Study data (0.79 and 0.92; P < .001), respectively. Model performance was higher when classifying moderate-to-severe compared with mild disease (area under the receiver operating characteristic curve = 0.98 and 0.91; P < .001), respectively. Models trained with the combined strategy generally had better performance across all datasets than the original strategy.

Conclusions

Deep learning glaucoma detection can achieve high accuracy across diverse datasets with appropriate training strategies. Because model performance was influenced by the severity of disease, labeling, training strategies, and population characteristics, reporting accuracy stratified by relevant covariates is important for cross study comparisons.

Translational relevance

High sensitivity and specificity of deep learning algorithms for moderate-to-severe glaucoma across diverse populations suggest a role for artificial intelligence in the detection of glaucoma in primary care.

SUBMITTER: Christopher M

PROVIDER: S-EPMC7396194 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms.

Christopher Mark M Nakahara Kenichi K Bowd Christopher C Proudfoot James A JA Belghith Akram A Goldbaum Michael H MH Rezapour Jasmin J Weinreb Robert N RN Fazio Massimo A MA Girkin Christopher A CA Liebmann Jeffrey M JM De Moraes Gustavo G Murata Hiroshi H Tokumo Kana K Shibata Naoto N Fujino Yuri Y Matsuura Masato M Kiuchi Yoshiaki Y Tanito Masaki M Asaoka Ryo R Zangwill Linda M LM

Translational vision science & technology 20200428 2

<h4>Purpose</h4>To compare performance of independently developed deep learning algorithms for detecting glaucoma from fundus photographs and to evaluate strategies for incorporating new data into models.<h4>Methods</h4>Two fundus photograph datasets from the Diagnostic Innovations in Glaucoma Study/African Descent and Glaucoma Evaluation Study and Matsue Red Cross Hospital were used to independently develop deep learning algorithms for detection of glaucoma at the University of California, San ...[more]

PMID: 32818088

Similar Datasets

Project description:Evaluating and tracking wound size is a fundamental metric for the wound assessment process. Good location and size estimates can enable proper diagnosis and effective treatment. Traditionally, laboratory wound healing studies include a collection of images at uniform time intervals exhibiting the wounded area and the healing process in the test animal, often a mouse. These images are then manually observed to determine key metrics -such as wound size progress- relevant to the study. However, this task is a time-consuming and laborious process. In addition, defining the wound edge could be subjective and can vary from one individual to another even among experts. Furthermore, as our understanding of the healing process grows, so does our need to efficiently and accurately track these key factors for high throughput (e.g., over large-scale and long-term experiments). Thus, in this study, we develop a deep learning-based image analysis pipeline that aims to intake non-uniform wound images and extract relevant information such as the location of interest, wound only image crops, and wound periphery size over-time metrics. In particular, our work focuses on images of wounded laboratory mice that are used widely for translationally relevant wound studies and leverages a commonly used ring-shaped splint present in most images to predict wound size. We apply the method to a dataset that was never meant to be quantified and, thus, presents many visual challenges. Additionally, the data set was not meant for training deep learning models and so is relatively small in size with only 256 images. We compare results to that of expert measurements and demonstrate preservation of information relevant to predicting wound closure despite variability from machine-to-expert and even expert-to-expert. The proposed system resulted in high fidelity results on unseen data with minimal human intervention. Furthermore, the pipeline estimates acceptable wound sizes when less than 50% of the images are missing reference objects.

Project description:ObjectiveTo assess the accuracy of probabilistic deep learning models to discriminate normal eyes and eyes with glaucoma from fundus photographs and visual fields.DesignAlgorithm development for discriminating normal and glaucoma eyes using data from multicenter, cross-sectional, case-control study.Subjects and participantsFundus photograph and visual field data from 1,655 eyes of 929 normal and glaucoma subjects to develop and test deep learning models and an independent group of 196 eyes of 98 normal and glaucoma patients to validate deep learning models.Main outcome measuresAccuracy and area under the receiver-operating characteristic curve (AUC).MethodsFundus photographs and OCT images were carefully examined by clinicians to identify glaucomatous optic neuropathy (GON). When GON was detected by the reader, the finding was further evaluated by another clinician. Three probabilistic deep convolutional neural network (CNN) models were developed using 1,655 fundus photographs, 1,655 visual fields, and 1,655 pairs of fundus photographs and visual fields collected from Compass instruments. Deep learning models were trained and tested using 80% of fundus photographs and visual fields for training set and 20% of the data for testing set. Models were further validated using an independent validation dataset. The performance of the probabilistic deep learning model was compared with that of the corresponding deterministic CNN model.ResultsThe AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and combined modalities using development dataset were 0.90 (95% confidence interval: 0.89-0.92), 0.89 (0.88-0.91), and 0.94 (0.92-0.96), respectively. The AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and both modalities using the independent validation dataset were 0.94 (0.92-0.95), 0.98 (0.98-0.99), and 0.98 (0.98-0.99), respectively. The AUC of the deep learning model in detecting glaucoma from fundus photographs, visual fields, and both modalities using an early glaucoma subset were 0.90 (0.88,0.91), 0.74 (0.73,0.75), 0.91 (0.89,0.93), respectively. Eyes that were misclassified had significantly higher uncertainty in likelihood of diagnosis compared to eyes that were classified correctly. The uncertainty level of the correctly classified eyes is much lower in the combined model compared to the model based on visual fields only. The AUCs of the deterministic CNN model using fundus images, visual field, and combined modalities based on the development dataset were 0.87 (0.85,0.90), 0.88 (0.84,0.91), and 0.91 (0.89,0.94), and the AUCs based on the independent validation dataset were 0.91 (0.89,0.93), 0.97 (0.95,0.99), and 0.97 (0.96,0.99), respectively, while the AUCs based on an early glaucoma subset were 0.88 (0.86,0.91), 0.75 (0.73,0.77), and 0.92 (0.89,0.95), respectively.Conclusion and relevanceProbabilistic deep learning models can detect glaucoma from multi-modal data with high accuracy. Our findings suggest that models based on combined visual field and fundus photograph modalities detects glaucoma with higher accuracy. While probabilistic and deterministic CNN models provided similar performance, probabilistic models generate certainty level of the outcome thus providing another level of confidence in decision making.

Dataset Information

Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms.

Purpose

Methods

Results

Conclusions

Translational relevance

Publications

Effects of Study Population, Labeling and Training on Glaucoma Detection Using Deep Learning Algorithms.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets