Dataset Information

A transcriptome-based deep neural network classifier for identifying the site of origin in mucinous cancer

ABSTRACT: Background There is a lack of tools for identifying the site of origin in mucinous cancer. This study aimed to evaluate the performance of a transcriptome-based classifier for identifying the site of origin in mucinous cancer. Methods Transcriptomic data of 1,878 non-mucinous and 82 mucinous cancer specimens, with 7 sites of origin, namely, the uterine cervix (CESC), colon (COAD), pancreas (PAAD), stomach (STAD), uterine endometrium (UCEC), uterine carcinosarcoma (UCS), and ovary (OV), obtained from The Cancer Genome Atlas, were used as the training and validation sets, respectively. Transcriptomic data of 14 mucinous cancer specimens from a tissue archive were used as the test set. For identifying the site of origin, a set of 100 differentially expressed genes for each site of origin was selected. After removing multiple iterations of the same gene, 626 genes were chosen, and their RNA expression profiles, at each site of origin, were used to train the deep neural network classifier. The performance of the classifier was estimated using the training, validation, and test sets. Results The accuracy of the model in the training set was 0.977, while that in the validation set was 0.939 (77/82). In the test set, the model showed an accuracy of 0.857 (12/14). t-SNE analysis revealed that samples in the test set were part of the clusters obtained for the training set. Conclusion Although limited by small sample size, we showed that a transcriptome-based classifier could correctly identify the site of origin of mucinous cancer.

ORGANISM(S): Homo sapiens

PROVIDER: GSE163126 | GEO | 2020/12/15

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Objective: The objective of this study was to estimate the accuracy of transcriptome-based classifier in differential diagnosis of uterine leiomyoma and leiomyosarcoma. Methods: We manually selected 114 normal uterine tissue and 31 leiomyosarcoma samples from publicly available transcriptome data in UCSC Xena as training/validation sets. We developed pre-processing procedure and gene selection method to sensitively find genes of larger variance in leiomyosarcoma than normal uterine tissues. Through our method, twenty genes were selected to build transcriptome-based classifier. The prediction accuracies of deep feedforward neural network (DNN), support vector machine (SVM), Random Forest (RF), and Gradient Boosting (GB) models were examined. We interpret the biological functionality of selected genes via network-based analysis using Gene-Mania. To validate the performance of trained model, we additionally collected 35 clinical samples of leiomyosarcoma and leiomyoma as a test set (18 + 17 as 1st and 2nd test sets). Results: We discovered genes expressed in a highly variable way in leiomyosarcoma while these genes are expressed in a conserved way in normal uterine samples. These genes were mainly associated with DNA replication, cell cycle, and DNA damage checkpoint. Among evaluated machine learning classifiers, the DNN had the highest accuracy and average AUC value in training data set. As gene selection and model training were made in leiomyosarcoma and uterine normal tissue, proving discriminant of ability between leiomyosarcoma and leiomyoma is necessary. Thus, further validation of trained model was conducted in newly collected clinical samples of leiomyosarcoma and leiomyoma. The DNN classifier performed AUC of 0.917 and 0.914 supporting that the selected genes in conjunction with DNN classifier are well discriminating the difference between leiomyosarcoma and leiomyoma in clinical sample. Conclusion: The transcriptome-based classifier accurately distinguished uterine leiomyoma from leiomyosarcoma.

Project description:Background We have recently constructed a DNA methylation classifier that can discriminate between pancreatic ductal adenocarcinoma (PAAD) liver metastasis and intrahepatic cholangiocarcinoma (iCCA) with high accuracy (PAAD-iCCA-Classifier). PAAD is one of the leading causes of cancer of unknown primary and diagnosis is based on exclusion of other malignancies. Therefore, our focus was to investigate whether the PAAD-iCCA-Classifier can be used to diagnose PAAD metastases from other sites. Methods For this scope, the anomaly detection filter of the initial classifier was expanded by 8 additional mimicker carcinomas, amounting to a total of 10 carcinomas in the negative class. We validated the updated version of the classifier on a validation set, which consisted of a biological cohort (n = 3579) and a technical one (n = 15). We then assessed the performance of the classifier on a test set, which included a positive control cohort of 16 PAAD metastases from various sites and a cohort of 124 negative control samples consisting of 96 breast cancer metastases from 18 anatomical sites and 28 carcinoma metastases to the brain. Results The updated PAAD-iCCA-Classifier achieved 98.21% accuracy on the biological validation samples, and on the technical validation ones it reached 100%. The classifier also correctly identified 15/16 (93.75%) metastases of the positive control as PAAD, and on the negative control, it correctly classified 122/124 samples (98.39%) for a 97.85% overall accuracy on the test set. We used this DNA methylation dataset to explore the organotropism of PAAD metastases and observed that PAAD liver metastases are distinct from PAAD peritoneal carcinomatosis and primary PAAD, and are characterized by specific copy number alterations and hypomethylation of enhancers involved in epithelial-mesenchymal-transition. Conclusions The updated PAAD-iCCA-Classifier (available at https://classifier.tgc-research.de/) can accurately classify PAAD samples from various metastatic sites and it can serve as a diagnostic aid.

Dataset Information

A transcriptome-based deep neural network classifier for identifying the site of origin in mucinous cancer

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets