Project description:Aberrant hypermethylation of CpG dinucleotides located in CpG islands within the promoters of key cancer genes is an epigenetic abnormality associated with heritable transcriptional gene silencing and inactivation in cancer. The genes involved include important tumor suppressors affecting key pathways for tumor initiation and progression. These methylated sequences can serve as potentially valuable markers for cancer risk assessment, diagnosis, prognosis, and prediction of therapeutic responses. In addition, many key cancer genes may be targeted by both epigenetic and genetic alterations and, thus epigenetic analysis can help focus the search for mutations, and vice versa. Studies of major cancer types suggest that any individual patient’s tumor may harbor at least 300 or more DNA hypermethylated genes. In TCGA, a pilot project is underway to begin defining these genes for GBM via genomic approaches. The approach in the epigenetic pilot is a two-tiered one which, first, involves pharmacological treatment of both well established human GBM cell lines, and a cell line grown as a neurosphere to enrich for tumor propagating cells, with a DNA methylation inhibitor (5-aza-2’-deoxycytidine, DAC) or a histone deacetylation inhibitor (trichostatin A) followed by an expression transcriptome analysis as previously described (Schuebel et. al.). This has resulted in identification of more than 3,700 total candidate genes. In the second tier, the top candidates are then analyzed on a custom Illumina GoldenGate array with the capacity to monitor methylation at a single CpG dinucleotide in the CpG islands of 1,498 gene promoters for the high throughput analysis of TCGA GBM samples. Keywords: Microarray, Hypermethylome, DNA-hypermethylation, DAC, TSA, Epigenetic, TCGA, The Cancer Genome Atlas, GBM, Glioblastoma, Glioblastoma multiforme, Brain
Project description:<p>The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.</p> <p>TCGA projects are organized by cancer type or subtype. Click <a href="http://cancergenome.nih.gov/cancersselected" target="_blank">here</a> for a current list of cancer types selected for study in TCGA.</p> <p>Data from TCGA (e.g., gene expression, copy number variation and clinical information), are available via the <a href="https://gdc.cancer.gov/" target="_blank">Genomic Data Commons (GDC)</a>.</p> <p>Data from TCGA projects are organized into two tiers: <b>Open Access and Controlled Access</b>. <ul> <li>Open Access data tier contains data that cannot be attributed to an individual research participant. The Open Access data tier does not require user certification. Data in Open Access tier are available in the TCGA Data Portal.</li> <li>Controlled Access data tier contains individual-level genotype data that are unique to an individual. Access to data in the Controlled Access data tier requires user certification through <a href="https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login" target="_blank">dbGaP Authorized Access</a>.</li> <li>Controlled Access data types consist of the following: <ul> <li>Individual germline variant data (SNP .cel files)</li> <li>Primary sequence data (.bam files), which are available at GDC</li> <li>Clinical free text fields</li> <li>Exon Array files (for Glioblastoma and Ovarian projects only)</li> </ul> </li> </ul> </p> <p><b>NOTE: TCGA strives to release most data in the open access tier. Individual genotype or sequence files are prominent exceptions. Commonly requested files such as descriptions of somatic mutations or clinical data are open access.</b></p> <p>Please go to this page: <a href="https://tcga-data.nci.nih.gov/docs/publications/" target="_blank">https://tcga-data.nci.nih.gov/docs/publications/</a> to access all data associated with TCGA tumor specific publications.</p> <p><b>The TCGA study is utilized in the following dbGaP substudies.</b> To view genotypes and other molecular data collected in these substudies, please click on the following substudies below or in the "Substudies" section of this top-level study page phs000178 TCGA study. <ul> <li><a href="./study.cgi?study_id=phs000854">phs000854</a> Genome-wide Analysis of Noncoding Regulatory Mutations in Cancer</li> </ul> </p>
Project description:<p>The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), which are both part of the National Institutes of Health, U.S. Department of Health and Human Services.</p> <p>TCGA projects are organized by cancer type or subtype. Click <a href="http://cancergenome.nih.gov/cancersselected" target="_blank">here</a> for a current list of cancer types selected for study in TCGA.</p> <p>Data from TCGA (e.g., gene expression, copy number variation and clinical information), are available via the <a href="https://tcga-data.nci.nih.gov/tcga/" target="_blank">TCGA Data Portal</a>, EXCEPT for the genomic sequence data (.bam files), which are hosted at the <a href="https://cghub.ucsc.edu/" target="_blank">Cancer Genomics Hub (CGHub)</a>.</p> <p>Data from TCGA projects are organized into two tiers: <b>Open Access and Controlled Access</b>. <ul> <li>Open Access data tier contains data that cannot be attributed to an individual research participant. The Open Access data tier does not require user certification. Data in Open Access tier are available in the TCGA Data Portal.</li> <li>Controlled Access data tier contains individual-level genotype data that are unique to an individual. Access to data in the Controlled Access data tier requires user certification through <a href="https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?login=&page=login" target="_blank">dbGaP Authorized Access</a>.</li> <li>Controlled Access data types consist of the following: <ul> <li>Individual germline variant data (SNP .cel files)</li> <li>Primary sequence data (.bam files), which are available at CGHub</li> <li>Clinical free text fields</li> <li>Exon Array files (for Glioblastoma and Ovarian projects only)</li> </ul> </li> </ul> </p> <p><b>NOTE: TCGA strives to release most data in the open access tier. Individual genotype or sequence files are prominent exceptions. Commonly requested files such as descriptions of somatic mutations or clinical data are open access.</b></p> <p><b>The TCGA study is utilized in the following dbGaP substudies.</b> To view genotypes and other molecular data collected in these substudies, please click on the following substudies below or in the "Substudies" box located on the right hand side of this top-level study page phs000178 TCGA study. <ul> <li><a href="./study.cgi?study_id=phs000441">phs000441</a> Integrated Genomic Analyses of Ovarian Carcinoma (OV)</li> <li><a href="./study.cgi?study_id=phs000489">phs000489</a> Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways</li> </ul> </p>
Project description:We reprocessed RNA-Seq data for 9264 tumor samples and 741 normal samples across 24 cancer types from The Cancer Genome Atlas with "Rsubread". Rsubread is an open source R package that has shown high concordance with other existing methods of alignment and summarization, but is simple to use and takes significantly less time to process data. Additionally, we provide clinical variables publicly available as of May 20, 2015 for the tumor samples where the TCGA ids are matched.
Project description:The Cancer Genome Atlas (TCGA) Isoform Expression Quantification Data is the largest ressource of isomiR level sequenced cancer data publicly available. Since the datasets were built up over years and through different contributing institutions, it is not free of batch effects. We evaluated different batch correction approaches to remove batch effects in the data, details of the best performing algorithm and batch variables are included in the supplementary file. Additionally, annotation of the chromosomal end position of each isomiR feature was corrected by the offset of 1 to account for exclusive annotation.
Project description:The "bidirectional gene pair" indicates a particular head-to-head gene organization in which transcription start sites of two genes are located on opposite strands of genomic DNA within a region of one kb. Despite bidirectional gene pairs are well characterized, little is known about their expression profiles and regulation features in tumorigenesis. We used RNA-seq data from The Cancer Genome Atlas (TCGA) dataset for a systematic analysis of the expression profiles of bidirectional gene pairs in 13 cancer datasets. Gene pairs on the opposite strand with transcription end site distance within one kb or on the same strand with the distance of two genes between 1-10 kb and gene pairs comprising two randomly chosen genes were used as control gene pairs (CG1, CG2, and random). We identified and characterized up-/down-regulated genes by comparing the expression level between tumors and adjacent normal tissues in 13 TCGA datasets. There were no consistently significant difference in the percentage of up-/down-regulated genes between bidirectional and control/random genes in most of TCGA datasets. However, the percentage of bidirectional gene pairs comprising two up- or two down-regulated genes was significantly higher than gene pairs from CG1/2 in 12/11 analyzed TCGA datasets and the random gene pairs in all 13 TCGA datasets. Then we identified the methylation correlated bidirectional genes to explore the regulatory mechanism of bidirectional genes. Like the differentially expressed gene pairs, the bidirectional genes in a pair were significantly prone to be both hypo- or hyper-methylation correlated genes in 12/13 TCGA datasets when comparing to the CG2/random gene pairs despite no significant difference between the percentages of hypo-/hyper-methylation correlated genes in bidirectional and CG2/random genes in most of TCGA datasets. Finally, we explored the correlation between bidirectional genes and patient's survival, identifying prognostic bidirectional genes and prognostic bidirectional gene pairs in each TCGA dataset. Remarkably, we found a group of prognostic bidirectional gene pairs in which the combination of two protein coding genes with different expression level correlated with different survival prognosis in survival analysis for OS. The percentage of these gene pairs in bidirectional gene pair were significantly higher than the gene pairs in controls in COAD datasets and lower in none of 13 TCGA datasets.
Project description:The goal of the CPTAC, TCGA Cancer Proteome Study of Colorectal Tissue is to analyze the proteomes of TCGA tumor samples that have been comprehensively characterized by molecular methods. Ninety-five TCGA tumor samples were used in this study.