Large-scale turnover of functional transcription factor binding sites in Drosophila.
Ontology highlight
ABSTRACT: The gain and loss of functional transcription factor binding sites has been proposed as a major source of evolutionary change in cis-regulatory DNA and gene expression. We have developed an evolutionary model to study binding-site turnover that uses multiple sequence alignments to assess the evolutionary constraint on individual binding sites, and to map gain and loss events along a phylogenetic tree. We apply this model to study the evolutionary dynamics of binding sites of the Drosophila melanogaster transcription factor Zeste, using genome-wide in vivo (ChIP-chip) binding data to identify functional Zeste binding sites, and the genome sequences of D. melanogaster, D. simulans, D. erecta, and D. yakuba to study their evolution. We estimate that more than 5% of functional Zeste binding sites in D. melanogaster were gained along the D. melanogaster lineage or lost along one of the other lineages. We find that Zeste-bound regions have a reduced rate of binding-site loss and an increased rate of binding-site gain relative to flanking sequences. Finally, we show that binding-site gains and losses are asymmetrically distributed with respect to D. melanogaster, consistent with lineage-specific acquisition and loss of Zeste-responsive regulatory elements.
Project description:Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Such methods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.
Project description:A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.
Project description:Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances.
Project description:Cis-regulatory sequences are not always conserved across species. Divergence within cis-regulatory sequences may result from the evolution of species-specific patterns of gene expression or the flexible nature of the cis-regulatory code. The identification of functional divergence in cis-regulatory sequences is therefore important for both understanding the role of gene regulation in evolution and annotating regulatory elements. We have developed an evolutionary model to detect the loss of constraint on individual transcription factor binding sites (TFBSs). We find that a significant fraction of functionally constrained binding sites have been lost in a lineage-specific manner among three closely related yeast species. Binding site loss has previously been explained by turnover, where the concurrent gain and loss of a binding site maintains gene regulation. We estimate that nearly half of all loss events cannot be explained by binding site turnover. Recreating the mutations that led to binding site loss confirms that these sequence changes affect gene expression in some cases. We also estimate that there is a high rate of binding site gain, as more than half of experimentally identified S. cerevisiae binding sites are not conserved across species. The frequent gain and loss of TFBSs implies that cis-regulatory sequences are labile and, in the absence of turnover, may contribute to species-specific patterns of gene expression.
Project description:BackgroundThe binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1.ResultsIn each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif.ConclusionsThe genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Project description:Advances in sequencing technology have boosted population genomics and made it possible to map the positions of transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for human individuals and Drosophila isogenic lines.We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of transcription factor binding.Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both humans and flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Project description:In the development of the Drosophila embryo, gene expression is directed by the sequence-specific interactions of a large network of protein transcription factors (TFs) and DNA cis-regulatory binding sites. Once the identity of the typically 8-10bp binding sites for any given TF has been determined by one of several experimental procedures, the sequences can be represented in a position weight matrix (PWM) and used to predict the location of additional TF binding sites elsewhere in the genome. Often, alignments of large (>200bp) genomic fragments that have been experimentally determined to bind the TF of interest in Chromatin Immunoprecipitation (ChIP) studies are trimmed under the assumption that the majority of the binding sites are located near the center of all the aligned fragments. In this study, ChIP/chip datasets are analyzed using the corresponding PWMs for the well-studied TFs; CAUDAL, HUNCHBACK, KNIRPS and KRUPPEL, to determine the distribution of predicted binding sites. All four TFs are critical regulators of gene expression along the anterio-posterior axis in early Drosophila development. For all four TFs, the ChIP peaks contain multiple binding sites that are broadly distributed across the genomic region represented by the peak, regardless of the prediction stringency criteria used. This result suggests that ChIP peak trimming may exclude functional binding sites from subsequent analyses.
Project description:The decoding of transcription factor (TF) binding signals in genomic DNA is a fundamental problem. Here we present a prediction model called BindSpace that learns to embed DNA sequences and TF labels into the same space. By training on binding data from hundreds of TFs and embedding over 1 M DNA sequences, BindSpace achieves state-of-the-art multiclass binding prediction performance, in vitro and in vivo, and can distinguish between signals of closely related TFs.
Project description:Transcription factor binding site(s) (TFBS) gain and loss (i.e., turnover) is a well-documented feature of cis-regulatory module (CRM) evolution, yet little attention has been paid to the evolutionary force(s) driving this turnover process. The predominant view, motivated by its widespread occurrence, emphasizes the importance of compensatory mutation and genetic drift. Positive selection, in contrast, although it has been invoked in specific instances of adaptive gene expression evolution, has not been considered as a general alternative to neutral compensatory evolution. In this study we evaluate the two hypotheses by analyzing patterns of single nucleotide polymorphism in the TFBS of well-characterized CRM in two closely related Drosophila species, Drosophila melanogaster and Drosophila simulans. An important feature of the analysis is classification of TFBS mutations according to the direction of their predicted effect on binding affinity, which allows gains and losses to be evaluated independently along the two phylogenetic lineages. The observed patterns of polymorphism and divergence are not compatible with neutral evolution for either class of mutations. Instead, multiple lines of evidence are consistent with contributions of positive selection to TFBS gain and loss as well as purifying selection in its maintenance. In discussion, we propose a model to reconcile the finding of selection driving TFBS turnover with constrained CRM function over long evolutionary time.
Project description:Variations in noncoding regulatory sequences play a central role in evolution but interpreting such variations remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis-regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, which contains two related genomes within the same nucleus. TFs localize preferentially to sites containing their known binding motifs, but occupy only a small fraction of the potential motif-coding genomic sites. Differential binding of TFs to the two orthologues is well explained by sequence variations that alter the consensus core motif, while changes in chromatin accessibility were of little apparent effect. Motif variations that abolish binding when present in one allele show only moderately reduced binding when common to both alleles, suggesting evolutionary compensation. At the promoter level, we report cases of turnover, where binding in the two orthologues localized to different promoter regions, yet most interspecies differences resulted from higher number of binding sites in one of the alleles. Our results suggest that cis variations affecting TF binding occur primarily through the gain and loss of cis-regulatory motifs at accessible regions.