Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:Large-scale serum miRNomics in combination with machine learning could lead to the development of a blood-based cancer classification system.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status.
Project description:The RNA polymerase II core promoter is the site of convergence of the signals that lead to the initiation of transcription. Here, we perform a comparative analysis of the downstream core promoter region (DPR) in Drosophila and humans by using machine learning. These studies revealed a distinct human-specific version of the DPR and led to the use of the machine learning models for the identification of synthetic extreme DPR motifs with specificity for human transcription factors relative to Drosophila factors, and vice versa. More generally, machine learning models could be analogously used to design synthetic promoter elements with customized functional properties.
Project description:We used machine-learning algorithms to identify a hypoxia-associated methylation signature in patients with HPV negative HNSCC in the TCGA-HNSCC cohort. This current submission forms the basis of the independent validation cohort used to test the Hypoxia-M classifier in our study.
Project description:Background:
To assist clinicians with diagnosis and optimal treatment decision-making, we attempted to develop and validate an artificial intelligence prediction model for lung metastasis (LM) in colorectal cancer (CRC) patients.
Method:
The clinicopathological characteristics of 46037 CRC patients from the Surveillance, Epidemiology, and End Results (SEER) database and 2779 CRC patients from a multi-center external validation set were collected retrospectively. After feature selection by univariate and multivariate analyses, six machine learning (ML) models, including logistic regression, K-nearest neighbor, support vector machine, decision tree, random forest, and balanced random forest (BRF), were developed and validated for the LM prediction. The optimization model with best performance was compared to the clinical predictor. In addition, stratified LM patients by risk score were utilized for survival analysis.
Project description:Background: Clinical misdiagnosis between cutaneous squamous cell carcinoma (cSCC) and basal cell carcinoma (BCC) poses treatment challenges and carries risks of recurrence, metastases, and increased morbidity and mortality. Objective: We aimed to identify discriminant proteins markers for cSCC and BCC using a minimally invasive proteome sampling method called e-biopsy, employing electroporation for non-thermal cell permeabilization and machine learning. Methods: E-biopsy facilitated ex vivo proteome extraction from 21 cSCC and 21 BCC pathologically validated human cancers. LC/MS/MS profiling of 126 proteomes was followed by Machine Learning analysis to identify proteins distinguishing cSCC from BCC. For identified panel validation, we used proteomes sampled by e-biopsy from unrelated 20 cSCC and 46 BCC human cancers, and differential expression analysis of published transcriptomics. The most commonly chosen discriminant biomarker by machine learning models, cornulin, was also validated using fluorescent immunohistochemistry. Results: 192 proteomes sampled from 108 patients were analyzed. Machine Learning-based approaches resulted in a set of 11 potential biomarker proteins that can be used to construct a model with 95.2% average cross-validation accuracy, BCC precision of 93.6±14.5%, cSCC precision of 98.4±7.2%, specificity of 97.7±11.8%, and per-patient sensitivity 92.7±15.3%. Protein-protein interaction analysis revealed a novel interaction network connecting 10 of the 11 resulted proteins. Histological and transcriptomic validation confirmed cornulin as a discriminant marker significantly lower in cSCC than in BCC. Conclusions: E-biopsy combined with machine learning provides a novel approach to molecular biomarkers sampling from skin for biomarker detection and differential expression analysis between cSCC and BCC