Project description:The Gleason grading system remains the most powerful prognostic predictor for patients with prostate cancer since the 1960s. Its application requires highly-trained pathologists, is tedious and yet suffers from limited inter-pathologist reproducibility, especially for the intermediate Gleason score 7. Automated annotation procedures constitute a viable solution to remedy these limitations. In this study, we present a deep learning approach for automated Gleason grading of prostate cancer tissue microarrays with Hematoxylin and Eosin (H&E) staining. Our system was trained using detailed Gleason annotations on a discovery cohort of 641 patients and was then evaluated on an independent test cohort of 245 patients annotated by two pathologists. On the test cohort, the inter-annotator agreements between the model and each pathologist, quantified via Cohen's quadratic kappa statistic, were 0.75 and 0.71 respectively, comparable with the inter-pathologist agreement (kappa = 0.71). Furthermore, the model's Gleason score assignments achieved pathology expert-level stratification of patients into prognostically distinct groups, on the basis of disease-specific survival data available for the test cohort. Overall, our study shows promising results regarding the applicability of deep learning-based solutions towards more objective and reproducible prostate cancer grading, especially for cases with heterogeneous Gleason patterns.
Project description:The Gleason score contributes significantly in predicting prostate cancer outcomes and selecting the appropriate treatment option, which is affected by well-known inter-observer variations. We present a novel deep learning-based automated Gleason grading system that does not require extensive region-level manual annotations by experts and/or complex algorithms for the automatic generation of region-level annotations. A total of 6664 and 936 prostate needle biopsy single-core slides (689 and 99 cases) from two institutions were used for system discovery and validation, respectively. Pathological diagnoses were converted into grade groups and used as the reference standard. The grade group prediction accuracy of the system was 77.5% (95% confidence interval (CI): 72.3-82.7%), the Cohen's kappa score (κ) was 0.650 (95% CI: 0.570-0.730), and the quadratic-weighted kappa score (κquad) was 0.897 (95% CI: 0.815-0.979). When trained on 621 cases from one institution and validated on 167 cases from the other institution, the system's accuracy reached 67.4% (95% CI: 63.2-71.6%), κ 0.553 (95% CI: 0.495-0.610), and the κquad 0.880 (95% CI: 0.822-0.938). In order to evaluate the impact of the proposed method, performance comparison with several baseline methods was also performed. While limited by case volume and a few more factors, the results of this study can contribute to the potential development of an artificial intelligence system to diagnose other cancers without extensive region-level annotations.
Project description:The early detection and accurate histopathological diagnosis of gastric cancer increase the chances of successful treatment. The worldwide shortage of pathologists offers a unique opportunity for the use of artificial intelligence assistance systems to alleviate the workload and increase diagnostic accuracy. Here, we report a clinically applicable system developed at the Chinese PLA General Hospital, China, using a deep convolutional neural network trained with 2,123 pixel-level annotated H&E-stained whole slide images. The model achieves a sensitivity near 100% and an average specificity of 80.6% on a real-world test dataset with 3,212 whole slide images digitalized by three scanners. We show that the system could aid pathologists in improving diagnostic accuracy and preventing misdiagnoses. Moreover, we demonstrate that our system performs robustly with 1,582 whole slide images from two other medical centres. Our study suggests the feasibility and benefits of using histopathological artificial intelligence assistance systems in routine practice scenarios.
Project description:ImportanceFor prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice.ObjectiveTo evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens.Design, setting, and participantsThe DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019.Main outcomes and measuresThe frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists' opinions with the subspecialists' majority opinions was also evaluated.ResultsFor grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P < .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58).Conclusions and relevanceIn this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions.
Project description:Adenocarcinomas of the prostate can be categorized into tumor grades based on the extent to which the cancers histologically resemble normal prostate glands. Because grades are surrogates of intrinsic tumor behavior, characterizing the molecular phenotype of grade is of potential clinical importance. To identify molecular alterations underlying prostate cancer grades, we used microdissection to obtain specific cohorts of cancer cells corresponding to the most common Gleason patterns (patterns 3, 4, and 5) from 29 radical prostatectomy samples. We paired each cancer sample with matched benign lumenal prostate epithelial cells and profiled transcript abundance levels by microarray analysis. We identified an 86-gene model capable of distinguishing low-grade (pattern 3) from high-grade (patterns 4 and 5) cancers. This model performed with 76% accuracy when applied to an independent set of 30 primary prostate carcinomas. Using tissue microarrays comprising >800 prostate samples, we confirmed a significant association between high levels of monoamine oxidase A expression and poorly differentiated cancers by immunohistochemistry. We also confirmed grade-associated levels of defender against death (DAD1) protein and HSD17 beta4 transcripts by immunohistochemistry and quantitative RT-PCR, respectively. The altered expression of these genes provides functional insights into grade-associated features of therapy resistance and tissue invasion. Furthermore, in identifying a profile of 86 genes that distinguish high- from low-grade carcinomas, we have generated a set of potential targets for modulating the development and progression of the lethal prostate cancer phenotype.
Project description:BackgroundGleason grading of prostate cancer is an important prognostic factor, but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether and to what extent A.I. grading translates to better prognostication.MethodsIn this study, we developed a system to predict prostate cancer-specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2807 prostatectomy cases from a single European center with 5-25 years of follow-up (median: 13, interquartile range 9-17).ResultsHere, we show that the A.I.'s risk scores produced a C-index of 0.84 (95% CI 0.80-0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. has a C-index of 0.82 (95% CI 0.78-0.85). On the subset of cases with a GG provided in the original pathology report (n = 1517), the A.I.'s C-indices are 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95% CI 0.71-0.86) for GG obtained from the reports. These represent improvements of 0.08 (95% CI 0.01-0.15) and 0.07 (95% CI 0.00-0.14), respectively.ConclusionsOur results suggest that A.I.-based Gleason grading can lead to effective risk stratification, and warrants further evaluation for improving disease management.
Project description:Despite revisions in 2005 and 2014, the Gleason prostate cancer (PCa) grading system still has major deficiencies. Combining of Gleason scores into a three-tiered grouping (6, 7, 8-10) is used most frequently for prognostic and therapeutic purposes. The lowest score, assigned 6, may be misunderstood as a cancer in the middle of the grading scale, and 3+4=7 and 4+3=7 are often considered the same prognostic group.To verify that a new grading system accurately produces a smaller number of grades with the most significant prognostic differences, using multi-institutional and multimodal therapy data.Between 2005 and 2014, 20,845 consecutive men were treated by radical prostatectomy at five academic institutions; 5501 men were treated with radiotherapy at two academic institutions.Outcome was based on biochemical recurrence (BCR). The log-rank test assessed univariable differences in BCR by Gleason score. Separate univariable and multivariable Cox proportional hazards used four possible categorizations of Gleason scores.In the surgery cohort, we found large differences in recurrence rates between both Gleason 3+4 versus 4+3 and Gleason 8 versus 9. The hazard ratios relative to Gleason score 6 were 1.9, 5.1, 8.0, and 11.7 for Gleason scores 3+4, 4+3, 8, and 9-10, respectively. These differences were attenuated in the radiotherapy cohort as a whole due to increased adjuvant or neoadjuvant hormones for patients with high-grade disease but were clearly seen in patients undergoing radiotherapy only. A five-grade group system had the highest prognostic discrimination for all cohorts on both univariable and multivariable analysis. The major limitation was the unavoidable use of prostate-specific antigen BCR as an end point as opposed to cancer-related death.The new PCa grading system has these benefits: more accurate grade stratification than current systems, simplified grading system of five grades, and lowest grade is 1, as opposed to 6, with the potential to reduce overtreatment of PCa.We looked at outcomes for prostate cancer (PCa) treated with radical prostatectomy or radiation therapy and validated a new grading system with more accurate grade stratification than current systems, including a simplified grading system of five grades and a lowest grade is 1, as opposed to 6, with the potential to reduce overtreatment of PCa.
Project description:Gleason grading, a risk stratification method for prostate cancer, is subjective and dependent on experience and expertise of the reporting pathologist. Deep Learning (DL) systems have shown promise in enhancing the objectivity and efficiency of Gleason grading. However, DL networks exhibit domain shift and reduced performance on Whole Slide Images (WSI) from a source other than training data. We propose a DL approach for segmenting and grading epithelial tissue using a novel training methodology that learns domain agnostic features. In this retrospective study, we analyzed WSI from three cohorts of prostate cancer patients. 3741 core needle biopsies (CNBs) received from two centers were used for training. The κquad (quadratic-weighted kappa) and AUC were measured for grade group comparison and core-level detection accuracy, respectively. Accuracy of 89.4% and κquad of 0.92 on the internal test set of 425 CNB WSI and accuracy of 85.3% and κquad of 0.96 on an external set of 1201 images, was observed. The system showed an accuracy of 83.1% and κquad of 0.93 on 1303 WSI from the third institution (blind evaluation). Our DL system, used as an assistive tool for CNB review, can potentially improve the consistency and accuracy of grading, resulting in better patient outcomes.
Project description:Purpose: This study aims to assess whole-mount Gleason grading (GG) in prostate cancer (PCa) accurately using a multiomics machine learning (ML) model and to compare its performance with biopsy-proven GG (bxGG) assessment. Materials and Methods: A total of 146 patients with PCa recruited in a pilot study of a prospective clinical trial (NCT02659527) were retrospectively included in the side study, all of whom underwent 68Ga-PSMA-11 integrated positron emission tomography (PET) / magnetic resonance (MR) before radical prostatectomy (RP) between May 2014 and April 2020. To establish a multiomics ML model, we quantified PET radiomics features, pathway-level genomics features from whole exome sequencing, and pathomics features derived from immunohistochemical staining of 11 biomarkers. Based on the multiomics dataset, five ML models were established and validated using 100-fold Monte Carlo cross-validation. Results: Among five ML models, the random forest (RF) model performed best in terms of the area under the curve (AUC). Compared to bxGG assessment alone, the RF model was superior in terms of AUC (0.87 vs 0.75), specificity (0.72 vs 0.61), positive predictive value (0.79 vs 0.75), and accuracy (0.78 vs 0.77) and showed slightly decreased sensitivity (0.83 vs 0.89) and negative predictive value (0.80 vs 0.81). Among the feature categories, bxGG was identified as the most important feature, followed by pathomics, clinical, radiomics and genomics features. The three important individual features were bxGG, PSA staining and one intensity-related radiomics feature. Conclusion: The findings demonstrate a superior assessment of the developed multiomics-based ML model in whole-mount GG compared to the current clinical baseline of bxGG. This enables personalized patient management by identifying high-risk PCa patients for RP.
Project description:Intracranial aneurysm is a common life-threatening disease. Computed tomography angiography is recommended as the standard diagnosis tool; yet, interpretation can be time-consuming and challenging. We present a specific deep-learning-based model trained on 1,177 digital subtraction angiography verified bone-removal computed tomography angiography cases. The model has good tolerance to image quality and is tested with different manufacturers. Simulated real-world studies are conducted in consecutive internal and external cohorts, in which it achieves an improved patient-level sensitivity and lesion-level sensitivity compared to that of radiologists and expert neurosurgeons. A specific cohort of suspected acute ischemic stroke is employed and it is found that 99.0% predicted-negative cases can be trusted with high confidence, leading to a potential reduction in human workload. A prospective study is warranted to determine whether the algorithm could improve patients' care in comparison to clinicians' assessment.