Common DNA sequence variation influences 3-dimensional conformation of the human genome
Ontology highlight
ABSTRACT: The 3-dimensional (3D) conformation of chromatin inside the nucleus is integral to a variety of nuclear processes including transcriptional regulation, DNA replication, and DNA damage repair. Aberrations in 3D chromatin conformation have been implicated in developmental abnormalities and cancer. Despite the importance of 3D chromatin conformation to cellular function and human health, little is known about how 3D chromatin conformation varies in the human population, or whether DNA sequence variation between individuals influences 3D chromatin conformation. To address these questions, we performed Hi-C on Lymphoblastoid Cell Lines (LCLs) from a panel of 20 individuals. We identify thousands of regions across the genome where 3D chromatin conformation varies between individuals and find that these conformational variations are often accompanied by variations in gene expression, histone modifications, and transcription factor (TF) binding. Moreover, we find that DNA sequence variation influences several features of 3D chromatin conformation including loop strength, contact insulation, contact directionality and density of local cis contacts. We map hundreds of Quantitative Trait Loci (QTLs) associated with 3D chromatin features and find evidence that some of these same variants are associated at modest levels with other molecular phenotypes as well as complex disease risk. Our results demonstrate that common DNA sequence variants can influence 3D chromatin conformation, pointing to a more pervasive role for 3D chromatin conformation in human phenotypic variation than previously recognized.
Project description:The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of celltype- specific interactions in settings where 3C-based assays are infeasible.
Project description:The organization of chromatin into self-interacting domains is universal among eukaryotic genomes, though how and why they form varies considerably. Here we report a chromosome-scale reference genome assembly of pepper (Capsicum annuum) and explore its 3D organization through integrating high-resolution Hi-C maps with epigenomic, transcriptomic, and genetic variation data. Chromatin folding domains in pepper are as prominent as TADs in mammals but exhibit unique characteristics. They tend to coincide with heterochromatic regions enriched with retrotransposons and are frequently embedded in loops, which may correlate with transcription factories. Their boundaries are hotspots for chromosome rearrangements but are otherwise depleted for genetic variation. While chromatin conformation broadly affects transcription variance, it does not predict differential gene expression between tissues. Our results suggest that pepper genome organization is explained by a model of heterochromatin-driven folding promoted by transcription factories and that such spatial architecture is under structural and functional constraints.
Project description:Hox genes are essential regulators of embryonic development. They are activated in a temporal sequence following their topological order within their genomic clusters. Subsequently, states of activity are fine-tuned and maintained to translate into domains of progressively overlapping gene products. While the mechanisms underlying such temporal and spatial progressions begin to be understood, many of their aspects remain unclear. We have systematically analyzed the 3D chromatin organization of Hox clusters in vivo, during their activation using high-resolution circular chromosome conformation capture (4C-seq). Initially, Hox clusters are organized as single 3D chromatin compartments decorated with bivalent chromatin marks. Their progressive transcriptional activation is associated with a dynamic bi-modal 3D organization, whereby the genes switch one after the other, from an inactive to an active 3D compartment. These local 3D dynamics occur within a larger constitutive framework of interactions within the surrounding Topological Associated Domains, which confirms previous results that regulation of this process in primarily cluster intrinsic. The local step-wise progression in time can be stopped and memorized at various body levels and hence it may accounts for the various chromatin architectures previously described at different anterior to posterior body levels for the same embryo at a later stage. Circular Chromosome Conformation Capture (4C-seq) samples from mouse ES cells and mouse embryonic samples at different stages of development. Data based on 41 biological samples.
Project description:Variation in regulatory DNA is thought to drive evolution. Cross-species comparisons of regulatory DNA have provided evidence for both weak purifying selection and substantial turnover in regulatory regions. However, disruption of transcription factor binding sites can affect the expression of neighboring genes. Thus, the base-pair level functional annotation of regulatory DNA has proven challenging. Here, we explore regulatory DNA variation and its functional consequences in genetically diverse strains of the plant Arabidopsis thaliana, which largely maintain the positional homology of regulatory DNA. Using chromatin accessibility to delineate regulatory DNA genome-wide, we find that 15% of approximately 50,000 regulatory sites varied in accessibility among strains. Some of these accessibility differences are associated with extensive underlying sequence variation, encompassing many deletions and dramatically hypervariable sequence. For the majority of such regulatory sites, nearby gene expression was similar, despite this large genetic variation. However, among all regulatory sites, those with both high levels of sequence variation and differential chromatin accessibility are the most likely to reside near genes with differential expression among strains. Unexpectedly, the vast majority of regulatory sites that differed in chromatin accessibility among strains show little variation in the underlying DNA sequence, implicating variation in upstream regulators.
Project description:Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. Here we investigate the use of iPSCs and iPSC-derived cells to study the impact of genetic variation across different cell types and as models for the genetics of complex disease. We established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring RNA, chromatin accessibility and DNA methylation. Regulatory variation between individuals is lower in iPSCs than in the differentiated cell types, consistent with the intuition that developmental processes are generally canalized. While most cell-type- specific regulatory effects lie in chromatin that is open only in the affected cell-types, we find that 20% of cell-type specific effects are in shared open chromatin. Finally, we developed deep neural network models to predict open chromatin regions in these cell types from DNA sequence alone and were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on tissue-specific chromatin accessibility. Our results provide a framework for using iPSC technology to study regulatory variation in cell types that are otherwise inaccessible. Keywords: Expression profiling by high throughput sequencing
Project description:We developed a targeted chromosome conformation capture (4C) approach that uses unique molecular identifiers (UMI) to derive high complexity quantitative chromosome contact profiles with controlled signal to noise ratios. We demonstrate that the method improves the sensitivity and specificity for detection of long-range chromosomal interactions, and that it allows the design of interaction screens with predictable statistical power. UMI-4C robustly quantifies contact intensity changes between cell types and conditions, opening the way toward incorporation of long-range interactions in quantitative models of gene regulation. We constructed UMI-4C profiles of 13 different genomic loci (viewpoints) in five different cell lines, in order to study the 3D chromatin contact maps of these selected loci. The coordinates for these viewpoints are: G1p1 chrX:48646542; baitG1_3_5kb chrX:48641393; bait_50kb chrX:48595987; bait_165kb chrX:48476525; ANK1 chr8:41654693; hbb_3HS chr11:5221346; hbb_HBB chr11:5248714; hbb_HBBP1_G1 chr11:5266532; HBB_HBE chr11:5292159; HBB_HS2 chr11:5301345; HBB_HS3 chr11:5306690; HBB_HS5 chr11:5313539; HBB_HBD chr11:5256597
Project description:Dynamic 3D chromatin conformation is a critical mechanism for gene regulation during development and disease. Despite this, profiling of 3D genome structure from complex tissues with cell-type specific resolution remains challenging. Recent efforts have demonstrated that cell-type specific epigenomic features can be resolved in complex tissues using single-cell assays. However, it remains unclear whether single-cell Chromatin Conformation Capture (3C) or Hi-C profiles can effectively identify cell types and reconstruct cell-type specific chromatin conformation maps. To address these challenges, we have developed single-nucleus methyl-3C sequencing (sn-m3C-seq) to capture chromatin organization and DNA methylation information and robustly separate heterogeneous cell types. Applying this method to >4,200 single human brain prefrontal cortex cells, we reconstruct cell-type specific chromatin conformation maps from 14 cortical cell types. These datasets reveal the genome-wide association between cell-type specific chromatin conformation and differential DNA methylation, suggesting pervasive interactions between epigenetic processes regulating gene expression.
Project description:Background: Although genetic or epigenetic alterations have shown to affect the three-dimensional organization of genomes, the utility of chromatin conformation in the classification of human disease has never been addressed. Results: Here, we explore whether chromatin conformation can be used to classify human leukemia. We map the conformation of the HOXA gene cluster in a panel of cell lines with 5C chromosome conformation capture technology, and use the data to train and test a support vector machine classifier named 3D-SP. We show that 3D-SP is able to accurately distinguish leukemias expressing MLL-fusion proteins from those expressing only wild-type MLL, and that it can also classify leukemia subtypes according to MLL fusion partner, based solely on 5C data. Conclusions: Our study provides the first proof-of-principle demonstration that chromatin conformation contains the information value necessary for classification of leukemia subtypes. Examination of CTCF and RAD21 binding sites in THP-1 cell.
Project description:Background: Although genetic or epigenetic alterations have shown to affect the three-dimensional organization of genomes, the utility of chromatin conformation in the classification of human disease has never been addressed. Results: Here, we explore whether chromatin conformation can be used to classify human leukemia. We map the conformation of the HOXA gene cluster in a panel of cell lines with 5C chromosome conformation capture technology, and use the data to train and test a support vector machine classifier named 3D-SP. We show that 3D-SP is able to accurately distinguish leukemias expressing MLL-fusion proteins from those expressing only wild-type MLL, and that it can also classify leukemia subtypes according to MLL fusion partner, based solely on 5C data. Conclusions: Our study provides the first proof-of-principle demonstration that chromatin conformation contains the information value necessary for classification of leukemia subtypes. Analysis of 38 samples using 5C technology. All data normalized using a 'master' BAC consisting of 5C data from 6 samples.
Project description:Although the 3D genome architecture is essential for long-range gene regulation, the significance of physical chromatin interactions is challenged by recent findings of mutual insensitiveness between contact propensity and gene expression. Here we report hundred basepair-resolution profiling of chromatin conformation in 33 colon tissues tracing the formation and malignant transformation of colorectal polyps. We identified progressive genome-wide decays of chromatin structures such as interaction stripes and loops on all types of cis-regulatory elements, particularly promoters, independent to the alterations of DNA methylation and chromatin accessibility. Instead of linearly correlated with degree of structural decays, transcription levels shifted toward correction of their quantitative mismatching with corresponding promoter contact propensity. These observations suggest increasing sensitivity of transcription to regulatory architecture integrity along with its loss during early cancer development, a mechanism which may provide novel insights to the misregulations of cancer driving genes.