Project description:Cell proliferation is essential for the development and maintenance of all organisms and is dysregulated in cancer. Using synchronized cells progressing through the cell cycle, pioneering microarray studies defined cell cycle genes based on cyclic variation in their expression. However, the concordance of the small number of synchronized cell studies has been limited, leading to discrepancies in definition of the transcriptionally regulated set of cell cycle genes within and between species. Here we present an informatics approach based on Boolean logic to identify cell cycle genes. This approach used the vast array of publicly available gene expression data sets to query similarity to CCNB1, which encodes the cyclin subunit of the Cdk1-cyclin B complex that triggers the G2-to-M transition. In addition to highlighting conservation of cell cycle genes across large evolutionary distances, this approach identified contexts where well-studied genes known to act during the cell cycle are expressed and potentially acting in nondivision contexts. An accessible web platform enables a detailed exploration of the cell cycle gene lists generated using the Boolean logic approach. The methods employed are straightforward to extend to processes other than the cell cycle.
Project description:Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we propose to use Boolean implications to find relationships between variables of different data types (mutation, copy number alteration, DNA methylation and gene expression) from the glioblastoma (GBM) and ovarian serous cystadenoma (OV) data sets from The Cancer Genome Atlas (TCGA). We find hundreds of thousands of Boolean implications from these data sets. A direct comparison of the relationships found by Boolean implications and those found by commonly used methods for mining associations show that existing methods would miss relationships found by Boolean implications. Furthermore, many relationships exposed by Boolean implications reflect important aspects of cancer biology. Examples of our findings include cis relationships between copy number alteration, DNA methylation and expression of genes, a new hierarchy of mutations and recurrent copy number alterations, loss-of-heterozygosity of well-known tumor suppressors, and the hypermethylation phenotype associated with IDH1 mutations in GBM. The Boolean implication results used in the paper can be accessed at http://crookneck.stanford.edu/microarray/TCGANetworks/.
Project description:BackgroundPatients with newly diagnosed non-metastatic prostate adenocarcinoma are typically classified as at low, intermediate, or high risk of disease progression using blood prostate-specific antigen concentration, tumour T category, and tumour pathological Gleason score. Classification is used to both predict clinical outcome and to inform initial management. However, significant heterogeneity is observed in outcome, particularly within the intermediate risk group, and there is an urgent need for additional markers to more accurately hone risk prediction. Recently developed web-based visualization and analysis tools have facilitated rapid interrogation of large transcriptome datasets, and querying broadly across multiple large datasets should identify predictors that are widely applicable.MethodsWe used camcAPP, cBioPortal, CRN, and NIH NCI GDC Data Portal to data mine publicly available large prostate cancer datasets. A test set of biomarkers was developed by identifying transcripts that had: 1) altered abundance in prostate cancer, 2) altered expression in patients with Gleason score 7 tumours and biochemical recurrence, 3) correlation of expression with time until biochemical recurrence across three datasets (Cambridge, Stockholm, MSKCC). Transcripts that met these criteria were then examined in a validation dataset (TCGA-PRAD) using univariate and multivariable models to predict biochemical recurrence in patients with Gleason score 7 tumours.ResultsTwenty transcripts met the test criteria, and 12 were validated in TCGA-PRAD Gleason score 7 patients. Ten of these transcripts remained prognostic in Gleason score 3 + 4 = 7, a sub-group of Gleason score 7 patients typically considered at a lower risk for poor outcome and often not targeted for aggressive management. All transcripts positively associated with recurrence encode or regulate mitosis and cell cycle-related proteins. The top performer was BUB1, one of four key MIR145-3P microRNA targets upregulated in hormone-sensitive as well as castration-resistant PCa. SRD5A2 converts testosterone to its more active form and was negatively associated with biochemical recurrence.ConclusionsUnbiased mining of large patient datasets identified 12 transcripts that independently predicted disease recurrence risk in Gleason score 7 prostate cancer. The mitosis and cell cycle proteins identified are also implicated in progression to castration-resistant prostate cancer, revealing a pivotal role for loss of cell cycle control in the latter.