Dataset Information

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

ABSTRACT:

Importance

For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice.

Objective

To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens.

Design, setting, and participants

The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019.

Main outcomes and measures

The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists' opinions with the subspecialists' majority opinions was also evaluated.

Results

For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P < .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58).

Conclusions and relevance

In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions.

SUBMITTER: Nagpal K

PROVIDER: S-EPMC7378872 | biostudies-literature | 2020 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

Nagpal Kunal K Foote Davis D Tan Fraser F Liu Yun Y Chen Po-Hsuan Cameron PC Steiner David F DF Manoj Naren N Olson Niels N Smith Jenny L JL Mohtashamian Arash A Peterson Brandon B Amin Mahul B MB Evans Andrew J AJ Sweet Joan W JW Cheung Carol C van der Kwast Theodorus T Sangoi Ankur R AR Zhou Ming M Allan Robert R Humphrey Peter A PA Hipp Jason D JD Gadepalli Krishna K Corrado Greg S GS Peng Lily H LH Stumpe Martin C MC Mermel Craig H CH

JAMA oncology 20200901 9

<h4>Importance</h4>For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice.<h4>Objective</h4>To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens.<h4>Design, setting, and participants</ ...[more]

PMID: 32701148

Similar Datasets

Project description:BackgroundThe pathologic diagnosis and Gleason grading of prostate cancer are time-consuming, error-prone, and subject to interobserver variability. Machine learning offers opportunities to improve the diagnosis, risk stratification, and prognostication of prostate cancer.ObjectiveTo develop a state-of-the-art deep learning algorithm for the histopathologic diagnosis and Gleason grading of prostate biopsy specimens.Design, setting, and participantsA total of 85 prostate core biopsy specimens from 25 patients were digitized at 20× magnification and annotated for Gleason 3, 4, and 5 prostate adenocarcinoma by a urologic pathologist. From these virtual slides, we sampled 14803 image patches of 256×256 pixels, approximately balanced for malignancy.Outcome measurements and statistical analysisWe trained and tested a deep residual convolutional neural network to classify each patch at two levels: (1) coarse (benign vs malignant) and (2) fine (benign vs Gleason 3 vs 4 vs 5). Model performance was evaluated using fivefold cross-validation. Randomization tests were used for hypothesis testing of model performance versus chance.Results and limitationsThe model demonstrated 91.5% accuracy (p<0.001) at coarse-level classification of image patches as benign versus malignant (0.93 sensitivity, 0.90 specificity, and 0.95 average precision). The model demonstrated 85.4% accuracy (p<0.001) at fine-level classification of image patches as benign versus Gleason 3 versus Gleason 4 versus Gleason 5 (0.83 sensitivity, 0.94 specificity, and 0.83 average precision), with the greatest number of confusions in distinguishing between Gleason 3 and 4, and between Gleason 4 and 5. Limitations include the small sample size and the need for external validation.ConclusionsIn this study, a deep learning-based computer vision algorithm demonstrated excellent performance for the histopathologic diagnosis and Gleason grading of prostate cancer.Patient summaryWe developed a deep learning algorithm that demonstrated excellent performance for the diagnosis and grading of prostate cancer.

Project description:ImportanceA chronic shortage of donor kidneys is compounded by a high discard rate, and this rate is directly associated with biopsy specimen evaluation, which shows poor reproducibility among pathologists. A deep learning algorithm for measuring percent global glomerulosclerosis (an important predictor of outcome) on images of kidney biopsy specimens could enable pathologists to more reproducibly and accurately quantify percent global glomerulosclerosis, potentially saving organs that would have been discarded.ObjectiveTo compare the performances of pathologists with a deep learning model on quantification of percent global glomerulosclerosis in whole-slide images of donor kidney biopsy specimens, and to determine the potential benefit of a deep learning model on organ discard rates.Design, setting, and participantsThis prognostic study used whole-slide images acquired from 98 hematoxylin-eosin-stained frozen and 51 permanent donor biopsy specimen sections retrieved from 83 kidneys. Serial annotation by 3 board-certified pathologists served as ground truth for model training and for evaluation. Images of kidney biopsy specimens were obtained from the Washington University database (retrieved between June 2015 and June 2017). Cases were selected randomly from a database of more than 1000 cases to include biopsy specimens representing an equitable distribution within 0% to 5%, 6% to 10%, 11% to 15%, 16% to 20%, and more than 20% global glomerulosclerosis.Main outcomes and measuresCorrelation coefficient (r) and root-mean-square error (RMSE) with respect to annotations were computed for cross-validated model predictions and on-call pathologists' estimates of percent global glomerulosclerosis when using individual and pooled slide results. Data were analyzed from March 2018 to August 2020.ResultsThe cross-validated model results of section images retrieved from 83 donor kidneys showed higher correlation with annotations (r = 0.916; 95% CI, 0.886-0.939) than on-call pathologists (r = 0.884; 95% CI, 0.825-0.923) that was enhanced when pooling glomeruli counts from multiple levels (r = 0.933; 95% CI, 0.898-0.956). Model prediction error for single levels (RMSE, 5.631; 95% CI, 4.735-6.517) was 14% lower than on-call pathologists (RMSE, 6.523; 95% CI, 5.191-7.783), improving to 22% with multiple levels (RMSE, 5.094; 95% CI, 3.972-6.301). The model decreased the likelihood of unnecessary organ discard by 37% compared with pathologists.Conclusions and relevanceThe findings of this prognostic study suggest that this deep learning model provided a scalable and robust method to quantify percent global glomerulosclerosis in whole-slide images of donor kidneys. The model performance improved by analyzing multiple levels of a section, surpassing the capacity of pathologists in the time-sensitive setting of examining donor biopsy specimens. The results indicate the potential of a deep learning model to prevent erroneous donor organ discard.

Project description:Background and objectiveGleason grading system is currently the clinical gold standard for determining prostate cancer aggressiveness. Prostate cancer is typically classified into one of 5 different categories with 1 representing the most indolent disease and 5 reflecting the most aggressive disease. Grades 3 and 4 are the most common and difficult patterns to be discriminated in clinical practice. Even though the degree of gland differentiation is the strongest determinant of Gleason grade, manual grading is subjective and is hampered by substantial inter-reader disagreement, especially with regard to intermediate grade groups.MethodsTo capture the topological characteristics and the degree of connectivity between nuclei around the gland, the concept of Homology Profile (HP) for prostate cancer grading is presented in this paper. HP is an algebraic tool, whereby, certain algebraic invariants are computed based on the structure of a topological space. We utilized the Statistical Representation of Homology Profile (SRHP) features to quantify the extent of glandular differentiation. The quantitative characteristics which represent the image patch are fed into a supervised classifier model for discrimination of grade patterns 3 and 4.ResultsOn the basis of the novel homology profile, we evaluated 43 digitized images of prostate biopsy slides annotated for regions corresponding to Grades 3 and 4. The quantitative patch-level evaluation results showed that our approach achieved an Area Under Curve (AUC) of 0.96 and an accuracy of 0.89 in terms of discriminating Grade 3 and 4 patches. Our approach was found to be superior to comparative methods including handcrafted cellular features, Stacked Sparse Autoencoder (SSAE) algorithm and end-to-end supervised learning method (DLGg). Also, slide-level quantitative and qualitative evaluation results reflect the ability of our approach in discriminating Gleason Grade 3 from 4 patterns on H&E tissue images.ConclusionsWe presented a novel Statistical Representation of Homology Profile (SRHP) approach for automated Gleason grading on prostate biopsy slides. The most discriminating topological descriptions of cancerous regions for grade 3 and 4 in prostate cancer were identified. Moreover, these characteristics of homology profile are interpretable, visually meaningful and highly consistent with the rubric employed by pathologists for the task of Gleason grading.

Project description:ImportanceEpstein-Barr virus (EBV)-associated gastric cancer (EBV-GC) is 1 of 4 molecular subtypes of GC and is confirmed by an expensive molecular test, EBV-encoded small RNA in situ hybridization. EBV-GC has 2 histologic characteristics, lymphoid stroma and lace-like tumor pattern, but projecting EBV-GC at biopsy is difficult even for experienced pathologists.ObjectiveTo develop and validate a deep learning algorithm to predict EBV status from pathology images of GC biopsy.Design, setting, and participantsThis diagnostic study developed a deep learning classifier to predict EBV-GC using image patches of tissue microarray (TMA) and whole slide images (WSIs) of GC and applied it to GC biopsy specimens from GCs diagnosed at Kangbuk Samsung Hospital between 2011 and 2020. For a quantitative evaluation and EBV-GC prediction on biopsy specimens, the area of each class and the fraction in total tissue or tumor area were calculated. Data were analyzed from March 5, 2021, to February 10, 2022.Main outcomes and measuresEvaluation metrics of predictive model performance were assessed on accuracy, recall, precision, F1 score, area under the receiver operating characteristic curve (AUC), and κ coefficient.ResultsThis study included 137 184 image patches from 16 TMAs (708 tissue cores), 24 WSIs, and 286 biopsy images of GC. The classifier was able to classify EBV-GC image patches from TMAs and WSIs with 94.70% accuracy, 0.936 recall, 0.938 precision, 0.937 F1 score, and 0.909 κ coefficient. The classifier was used for predicting and measuring the area and fraction of EBV-GC on biopsy tissue specimens. A 10% cutoff value for the predicted fraction of EBV-GC to tissue (EBV-GC/tissue area) produced the best prediction results in EBV-GC biopsy specimens and showed the highest AUC value (0.8723; 95% CI, 0.7560-0.9501). That cutoff also obtained high sensitivity (0.895) and moderate specificity (0.745) compared with experienced pathologist sensitivity (0.842) and specificity (0.854) when using the presence of lymphoid stroma and a lace-like pattern as diagnostic criteria. On prediction maps, EBV-GCs with lace-like pattern and lymphoid stroma showed the same prediction results as EBV-GC, but cases lacking these histologic features revealed heterogeneous prediction results of EBV-GC and non-EBV-GC areas.Conclusions and relevanceThis study showed the feasibility of EBV-GC prediction using a deep learning algorithm, even in biopsy samples. Use of such an image-based classifier before a confirmatory molecular test will reduce costs and tissue waste.

Dataset Information

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

Importance

Objective

Design, setting, and participants

Main outcomes and measures

Results

Conclusions and relevance

Publications

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets