Project description:<p>We experimentally determined Mb-scale haplotypes from germline and cancer genome sequencing. Using a microfluidic process that partitions long DNA fragments into hundreds of thousands of barcoded reactions, we determined haplotype blocks from matched colorectal cancer samples in the delineation of haplotypes of allelic imbalances and other genomic instability events.</p>
Project description:Repetitive sequences are hotspots of evolution at multiple levels. However, due to technical difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content of proteomes and genomes, directly from proteomic and genomic raw sequence data, and applied it to analyze a wide range of tumors and normal tissues. We identify high similarity between the repeat-instability in tumors and their patient-matched normal tissues, but also tumor-specific signatures, both in protein expression and in the genome, that strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We find an inverse relationship between repeat-instability and point mutation load, within and across patients, and independently of other somatic aberrations. Thus, repeat-instability is a distinct, transient and compensatory adaptive mechanism in tumor evolution.
Project description:Genetic variation amongst individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single-nucleotide changes. In this manuscript we explore variation on an intermediate scale-particularly insertions, deletions, and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number among individuals. Sequencing of a subset of structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence-map of human structural variation-an important standard for genotyping platforms and a prelude to future individual genome sequencing projects. Keywords: comparitive genomic hybridization, copy number variation, structural variation, fosmid end sequencing CGH analysis targeted against sites identified by fosmid end sequencing. 8 HapMap samples (sources of libraries ABC7-ABC14) are hybed against NA15510 (source of fosmid library G248).
Project description:We have developed a method for mapping unmethylated sites in human genome based on the resistant of TspR1 digested ends to exoIII nuclease degradation. Digestion with TspR1 and methylation-sensitive restriction endonuclease, HpaII, followed by exoIII and single strand DNA nuclease allows the removal of DNA fragments containing unmethylated HpaII sites. We then use array CGH to map the sequences depleted by this procedures in human genomes derived from five human tissues, a primary breast tumor and two breast tumor cell lines. Analysis of methylation patterns of the normal tissue genomes indicates that the hypomethylated sites are enriched in the 5’ end of widely expressed genes including promoter, first exon and first intron. In contrast, genomes of the MCF-7 and MDA-MB-231 cell lines show extensive hypomethylation in the intragenic and intergenic regions whereas primary tumor exhibits intermediate pattern between normal tissue and cell lines. A striking characteristic of tumor genomes is the presence of megabase-sized hypomethylated zones. These hypomethylated zones are associated with large genes, fragile sites, evolutionary breakpoints, chromosomal rearrangement breakpoints, tumor supperessor genes, and with regions containing tissue-specific gene clusters or with gene poor region containing novel tissue-specific genes. Bisulfite sequencing analysis shows a novel mosaic methylation pattern with alternative methylated and unmethylated zones was found in human histone gene clusters in chromosome 6. Correlation with microarray analysis show that genes with hypomethylated sequence 2kb up- or down-stream of transcription start site are highly expressed whereas genes with extensive intragenic and 3’ UTR hypomethylation are silenced. The method described herein can be used for large scale screening of changes in methylation pattern in the genome of interest. Keywords: Genome-Wide Mapping of Hypomethylated Sites in Human Genomes
Project description:Genetic variation amongst individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single-nucleotide changes. In this manuscript we explore variation on an intermediate scale-particularly insertions, deletions, and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number among individuals. Sequencing of a subset of structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence-map of human structural variation-an important standard for genotyping platforms and a prelude to future individual genome sequencing projects. Keywords: comparitive genomic hybridization, copy number variation, structural variation, fosmid end sequencing
Project description:Genetic variation amongst individual humans occurs on many different scales, ranging from gross alterations in the human karyotype to single-nucleotide changes. In this manuscript we explore variation on an intermediate scale-particularly insertions, deletions, and inversions affecting from a few thousand to a few million base pairs. We employed a clone-based method to interrogate this intermediate structural variation in eight individuals of diverse geographic ancestry. Our analysis provides a comprehensive overview of the normal pattern of structural variation present in these genomes, refining the location of 1695 structural variants. We find that 50% were seen in more than one individual and that nearly half lay outside regions of the genome previously described as structurally variant. We discover 525 new insertion sequences that are not present in the human reference genome and show that many of these are variable in copy number among individuals. Sequencing of a subset of structural variants reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome. These data provide the first high-resolution sequence-map of human structural variation-an important standard for genotyping platforms and a prelude to future individual genome sequencing projects. Keywords: comparative genomic hybridization
Project description:Understanding 3D genome structure requires high throughput, genome-wide approaches. However, assays for all vs. all chromatin interaction mapping are expensive and time consuming, which severely restricts their usage for large-scale mutagenesis screens or for mapping the impact of sequence variants. Computational models sophisticated enough to grasp the determinants of chromatin folding provide a unique window into the functional determinants of 3D genome structure as well as the effects of genome variation. A chromatin interaction predictor should work at the base pair level but also incorporate large-scale genomic context to simultaneously capture the large scale and intricate structures of chromatin architecture. Similarly, to be a flexible and generalisable approach it should also be applicable to data it has not been explicitly trained on. To develop a model with these properties, we designed a deep neuronal network (deepC) that utilizes transfer learning to accurately predict chromatin interactions from DNA sequence at megabase scale. The model generalizes well to unseen chromosomes and works across cell types, Hi-C data resolutions and a range of sequencing depths. DeepC integrates DNA sequence context on an unprecedented scale, bridging the different levels of resolution from base pairs to TADs. We demonstrate how this model allows us to investigate sequence determinants of chromatin folding at genome-wide scale and to predict the importance of regulatory elements and the impact of sequence variations.