Dataset Information

Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

ABSTRACT: BACKGROUND:The diagnostic performance of convolutional neural networks (CNNs) for diagnosing several types of skin neoplasms has been demonstrated as comparable with that of dermatologists using clinical photography. However, the generalizability should be demonstrated using a large-scale external dataset that includes most types of skin neoplasms. In this study, the performance of a neural network algorithm was compared with that of dermatologists in both real-world practice and experimental settings. METHODS AND FINDINGS:To demonstrate generalizability, the skin cancer detection algorithm (https://rcnn.modelderm.com) developed in our previous study was used without modification. We conducted a retrospective study with all single lesion biopsied cases (43 disorders; 40,331 clinical images from 10,426 cases: 1,222 malignant cases and 9,204 benign cases); mean age (standard deviation [SD], 52.1 [18.3]; 4,701 men [45.1%]) were obtained from the Department of Dermatology, Severance Hospital in Seoul, Korea between January 1, 2008 and March 31, 2019. Using the external validation dataset, the predictions of the algorithm were compared with the clinical diagnoses of 65 attending physicians who had recorded the clinical diagnoses with thorough examinations in real-world practice. In addition, the results obtained by the algorithm for the data of randomly selected batches of 30 patients were compared with those obtained by 44 dermatologists in experimental settings; the dermatologists were only provided with multiple images of each lesion, without clinical information. With regard to the determination of malignancy, the area under the curve (AUC) achieved by the algorithm was 0.863 (95% confidence interval [CI] 0.852-0.875), when unprocessed clinical photographs were used. The sensitivity and specificity of the algorithm at the predefined high-specificity threshold were 62.7% (95% CI 59.9-65.1) and 90.0% (95% CI 89.4-90.6), respectively. Furthermore, the sensitivity and specificity of the first clinical impression of 65 attending physicians were 70.2% and 95.6%, respectively, which were superior to those of the algorithm (McNemar test; p < 0.0001). The positive and negative predictive values of the algorithm were 45.4% (CI 43.7-47.3) and 94.8% (CI 94.4-95.2), respectively, whereas those of the first clinical impression were 68.1% and 96.0%, respectively. In the reader test conducted using images corresponding to batches of 30 patients, the sensitivity and specificity of the algorithm at the predefined threshold were 66.9% (95% CI 57.7-76.0) and 87.4% (95% CI 82.5-92.2), respectively. Furthermore, the sensitivity and specificity derived from the first impression of 44 of the participants were 65.8% (95% CI 55.7-75.9) and 85.7% (95% CI 82.4-88.9), respectively, which are values comparable with those of the algorithm (Wilcoxon signed-rank test; p = 0.607 and 0.097). Limitations of this study include the exclusive use of high-quality clinical photographs taken in hospitals and the lack of ethnic diversity in the study population. CONCLUSIONS:Our algorithm could diagnose skin tumors with nearly the same accuracy as a dermatologist when the diagnosis was performed solely with photographs. However, as a result of limited data relevancy, the performance was inferior to that of actual medical examination. To achieve more accurate predictive diagnoses, clinical information should be integrated with imaging information.

SUBMITTER: Han SS

PROVIDER: S-EPMC7688128 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

Han Seung Seog SS Moon Ik Jun IJ Kim Seong Hwan SH Na Jung-Im JI Kim Myoung Shin MS Park Gyeong Hun GH Park Ilwoo I Kim Keewon K Lim Woohyung W Lee Ju Hee JH Chang Sung Eun SE

PLoS medicine 20201125 11

<h4>Background</h4>The diagnostic performance of convolutional neural networks (CNNs) for diagnosing several types of skin neoplasms has been demonstrated as comparable with that of dermatologists using clinical photography. However, the generalizability should be demonstrated using a large-scale external dataset that includes most types of skin neoplasms. In this study, the performance of a neural network algorithm was compared with that of dermatologists in both real-world practice and experim ...[more]

PMID: 33237903

Similar Datasets

Project description:ObjectivesTo develop and test the performance of computerized ultrasound image analysis using deep neural networks (DNNs) in discriminating between benign and malignant ovarian tumors and to compare its diagnostic accuracy with that of subjective assessment (SA) by an ultrasound expert.MethodsWe included 3077 (grayscale, n = 1927; power Doppler, n = 1150) ultrasound images from 758 women with ovarian tumors, who were classified prospectively by expert ultrasound examiners according to IOTA (International Ovarian Tumor Analysis) terms and definitions. Histological outcome from surgery (n = 634) or long-term (≥ 3 years) follow-up (n = 124) served as the gold standard. The dataset was split into a training set (n = 508; 314 benign and 194 malignant), a validation set (n = 100; 60 benign and 40 malignant) and a test set (n = 150; 75 benign and 75 malignant). We used transfer learning on three pre-trained DNNs: VGG16, ResNet50 and MobileNet. Each model was trained, and the outputs calibrated, using temperature scaling. An ensemble of the three models was then used to estimate the probability of malignancy based on all images from a given case. The DNN ensemble classified the tumors as benign or malignant (Ovry-Dx1 model); or as benign, inconclusive or malignant (Ovry-Dx2 model). The diagnostic performance of the DNN models, in terms of sensitivity and specificity, was compared to that of SA for classifying ovarian tumors in the test set.ResultsAt a sensitivity of 96.0%, Ovry-Dx1 had a specificity similar to that of SA (86.7% vs 88.0%; P = 1.0). Ovry-Dx2 had a sensitivity of 97.1% and a specificity of 93.7%, when designating 12.7% of the lesions as inconclusive. By complimenting Ovry-Dx2 with SA in inconclusive cases, the overall sensitivity (96.0%) and specificity (89.3%) were not significantly different from using SA in all cases (P = 1.0).ConclusionUltrasound image analysis using DNNs can predict ovarian malignancy with a diagnostic accuracy comparable to that of human expert examiners, indicating that these models may have a role in the triage of women with an ovarian tumor. © 2020 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.

Project description:Bacterial vaginosis (BV) is caused by the excessive and imbalanced growth of bacteria in vagina, affecting 30 to 50% of women. Gram staining followed by Nugent scoring based on bacterial morphotypes under the microscope is considered the gold standard for BV diagnosis; this method is often labor-intensive and time-consuming, and results vary from person to person. We developed and optimized a convolutional neural network (CNN) model and evaluated its ability to automatically identify and classify three categories of Nugent scores from microscope images. The CNN model was first established with a panel of microscopic images with Nugent scores determined by experts. The model was trained by minimizing the cross-entropy loss function and optimized by using a momentum optimizer. The separate test sets of images collected from three hospitals were evaluated by the CNN model. The CNN model consisted of 25 convolutional layers, 2 pooling layers, and a fully connected layer. The model obtained 82.4% sensitivity and 96.6% specificity with the 5,815 validation images when altered vaginal flora and BV were considered the positive samples, which was better than the rates achieved by top-level technologists and obstetricians in China. The capability of our model for generalization was so strong that it exhibited 75.1% accuracy in three categories of Nugent scores on the independent test set of 1,082 images, which was 6.6% higher than the average of three technologists, who are hold bachelor's degrees in medicine and are qualified to make diagnostic decisions. When three technologists ran one specimen in triplicate, the precision of three categories of Nugent scores was 54.0%. One hundred three samples diagnosed by two technologists on different days showed a repeatability of 90.3%. The CNN model outperformed human health care practitioners in terms of accuracy and stability for three categories of Nugent score diagnosis. The deep learning model may offer translational applications in automating diagnosis of bacterial vaginosis with proper supporting hardware.

Project description:ImportanceConvolutional neural networks (CNNs) achieve expert-level accuracy in the diagnosis of pigmented melanocytic lesions. However, the most common types of skin cancer are nonpigmented and nonmelanocytic, and are more difficult to diagnose.ObjectiveTo compare the accuracy of a CNN-based classifier with that of physicians with different levels of experience.Design, setting, and participantsA CNN-based classification model was trained on 7895 dermoscopic and 5829 close-up images of lesions excised at a primary skin cancer clinic between January 1, 2008, and July 13, 2017, for a combined evaluation of both imaging methods. The combined CNN (cCNN) was tested on a set of 2072 unknown cases and compared with results from 95 human raters who were medical personnel, including 62 board-certified dermatologists, with different experience in dermoscopy.Main outcomes and measuresThe proportions of correct specific diagnoses and the accuracy to differentiate between benign and malignant lesions measured as an area under the receiver operating characteristic curve served as main outcome measures.ResultsAmong 95 human raters (51.6% female; mean age, 43.4 years; 95% CI, 41.0-45.7 years), the participants were divided into 3 groups (according to years of experience with dermoscopy): beginner raters (<3 years), intermediate raters (3-10 years), or expert raters (>10 years). The area under the receiver operating characteristic curve of the trained cCNN was higher than human ratings (0.742; 95% CI, 0.729-0.755 vs 0.695; 95% CI, 0.676-0.713; P < .001). The specificity was fixed at the mean level of human raters (51.3%), and therefore the sensitivity of the cCNN (80.5%; 95% CI, 79.0%-82.1%) was higher than that of human raters (77.6%; 95% CI, 74.7%-80.5%). The cCNN achieved a higher percentage of correct specific diagnoses compared with human raters (37.6%; 95% CI, 36.6%-38.4% vs 33.5%; 95% CI, 31.5%-35.6%; P = .001) but not compared with experts (37.3%; 95% CI, 35.7%-38.8% vs 40.0%; 95% CI, 37.0%-43.0%; P = .18).Conclusions and relevanceNeural networks are able to classify dermoscopic and close-up images of nonpigmented lesions as accurately as human experts in an experimental setting.

Project description:Purpose: Adrenal incidentalomas must be differentiated from adrenocortical cancer (ACC). Currently, size, growth, and imaging characteristics determine the potential for malignancy but are imperfect. The aim was to evaluate whether urinary small molecules (<800 Da) are associated with ACC.Experimental Design: Preoperative fasting urine specimens from patients with ACC (n = 19) and benign adrenal tumors (n = 46) were analyzed by unbiased ultraperformance liquid chromatography/mass spectrometry. Creatinine-normalized features were analyzed by Progenesis, SIMCA, and unpaired t test adjusted by FDR. Features with an AUC >0.8 were identified through fragmentation patterns and database searches. All lead features were assessed in an independent set from patients with ACC (n = 11) and benign adrenal tumors (n = 46) and in a subset of tissue samples from patients with ACC (n = 15) and benign adrenal tumors (n = 15) in the training set.Results: Sixty-nine features were discovered and four known metabolites identified. Urinary creatine riboside was elevated 2.1-fold (P = 0.0001) in patients with ACC. L-tryptophan, N?,N?,N?-trimethyl-L-lysine, and 3-methylhistidine were lower 0.33-fold (P < 0.0001), 0.56-fold (P < 0.0001), and 0.33-fold (P = 0.0003) in patients with ACC, respectively. Combined multivariate analysis of the four biomarkers showed an AUC of 0.89 [sensitivity 94.7% (confidence interval {CI}, 73.9%-99.1%), specificity 82.6% (CI, 68.6%-92.2%), PPV 69.2% (CI, 48.2%-85.6%), and NPV 97.4% (CI, 86.5%-99.6%)] for distinguishing ACC from benign tumors. Of the four, creatine riboside and four unknown features were validated. Creatine riboside, N?,N?,N?-trimethyl-L-lysine, and two unknown features were elevated in ACC tumors.Conclusions: There are unique urinary metabolic features in patients with ACC with some metabolites present in patient tumor samples. Urinary creatine riboside can differentiate benign adrenal neoplasms from ACC. Clin Cancer Res; 23(17); 5302-10. ©2017 AACR.

Dataset Information

Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

Publications

Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets