Project description:Most tumor samples are a heterogeneous mixture of cells, including admixture by normal (non-cancerous) cells and subpopulations of cancerous cells with different complements of somatic aberrations. This intra-tumor heterogeneity complicates the analysis of somatic aberrations in DNA sequencing data from tumor samples.We describe an algorithm called THetA2 that infers the composition of a tumor sample-including not only tumor purity but also the number and content of tumor subpopulations-directly from both whole-genome (WGS) and whole-exome (WXS) high-throughput DNA sequencing data. This algorithm builds on our earlier Tumor Heterogeneity Analysis (THetA) algorithm in several important directions. These include improved ability to analyze highly rearranged genomes using a variety of data types: both WGS sequencing (including low ?7× coverage) and WXS sequencing. We apply our improved THetA2 algorithm to WGS (including low-pass) and WXS sequence data from 18 samples from The Cancer Genome Atlas (TCGA). We find that the improved algorithm is substantially faster and identifies numerous tumor samples containing subclonal populations in the TCGA data, including in one highly rearranged sample for which other tumor purity estimation algorithms were unable to estimate tumor purity.
Project description:Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.
Project description:The Genetic Association Information Network (GAIN) Data Access Committee was established in June 2007 to provide prompt and fair access to data from six genome-wide association studies through the database of Genotypes and Phenotypes (dbGaP). Of 945 project requests received through 2011, 749 (79%) have been approved; median receipt-to-approval time decreased from 14 days in 2007 to 8 days in 2011. Over half (54%) of the proposed research uses were for GAIN-specific phenotypes; other uses were for method development (26%) and adding controls to other studies (17%). Eight data-management incidents, defined as compromises of any of the data-use conditions, occurred among nine approved users; most were procedural violations, and none violated participant confidentiality. Over 5 years of experience with GAIN data access has demonstrated substantial use of GAIN data by investigators from academic, nonprofit, and for-profit institutions with relatively few and contained policy violations. The availability of GAIN data has allowed for advances in both the understanding of the genetic underpinnings of mental-health disorders, diabetes, and psoriasis and the development and refinement of statistical methods for identifying genetic and environmental factors related to complex common diseases.