Project description:Sea stars and sea urchins are model systems for interrogating the types of deep evolutionary changes that have restructured developmental gene regulatory networks (GRNs). Although cis-regulatory DNA evolution is likely the predominant mechanism of change, it was recently shown that Tbrain, a Tbox transcription factor protein, has evolved a changed preference for a low-affinity, secondary binding motif. The primary, high-affinity motif is conserved. To date, however, no genome-wide comparisons have been performed to provide an unbiased assessment of the evolution of GRNs between these taxa, and no study has attempted to determine the interplay between transcription factor binding motif evolution and GRN topology. The study here measures genome-wide binding of Tbrain orthologs by using ChIP-sequencing and associates these orthologs with putative target genes to assess global function. Targets of both factors are enriched for other regulatory genes, although nonoverlapping sets of functional enrichments in the two datasets suggest a much diverged function. The number of low-affinity binding motifs is significantly depressed in sea urchins compared with sea star, but both motif types are associated with genes from a range of functional categories. Only a small fraction (∼10%) of genes are predicted to be orthologous targets. Collectively, these data indicate that Tbr has evolved significantly different developmental roles in these echinoderms and that the targets and the binding motifs in associated cis-regulatory sequences are dispersed throughout the hierarchy of the GRN, rather than being biased toward terminal process or discrete functional blocks, which suggests extensive evolutionary tinkering.
Project description:Eukaryotic transcription factors (TFs) from the same structural family tend to bind similar DNA sequences, despite the ability of these TFs to execute distinct functions in vivo. The cell partly resolves this specificity paradox through combinatorial strategies and the use of low-affinity binding sites, which are better able to distinguish between similar TFs. However, because these sites have low affinity, it is challenging to understand how TFs recognize them in vivo. Here, we summarize recent findings and technological advancements that allow for the quantification and mechanistic interpretation of TF recognition across a wide range of affinities. We propose a model that integrates insights from the fields of genetics and cell biology to provide further conceptual understanding of TF binding specificity. We argue that in eukaryotes, target specificity is driven by an inhomogeneous 3D nuclear distribution of TFs and by variation in DNA binding affinity such that locally elevated TF concentration allows low-affinity binding sites to be functional.
Project description:Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression.
Project description:Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. TF profiling is commonly carried out by formaldehyde cross-linking and sonication followed by chromatin immunoprecipitation (X-ChIP). We describe a method to profile TF binding at high resolution without cross-linking. We begin with micrococcal nuclease-digested non-cross-linked chromatin and then perform affinity purification of TFs and paired-end sequencing. The resulting occupied regions of genomes from affinity-purified naturally isolated chromatin (ORGANIC) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide high-resolution maps that are accurate, as defined by the presence of known TF consensus motifs in identified binding sites, that are not biased toward accessible chromatin and that do not require input normalization. We profiled Drosophila melanogaster GAGA factor and Pipsqueak to test ORGANIC performance on larger genomes. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions.
Project description:Sequence-specific binding by transcription factors (TFs) interprets regulatory information encoded in the genome. Using recently published universal protein binding microarray (PBM) data on the in vitro DNA binding preferences of these proteins for all possible 8-base-pair sequences, we examined the evolutionary conservation and enrichment within putative regulatory regions of the binding sequences of a diverse library of 104 nonredundant mouse TFs spanning 22 different DNA-binding domain structural classes. We found that not only high affinity binding sites, but also numerous moderate and low affinity binding sites, are under negative selection in the mouse genome. These 8-mers occur preferentially in putative regulatory regions of the mouse genome, including CpG islands and non-exonic ultraconserved elements (UCEs). Of TFs whose PBM "bound" 8-mers are enriched within sets of tissue-specific UCEs, many are expressed in the same tissue(s) as the UCE-driven gene expression. Phylogenetically conserved motif occurrences of various TFs were also enriched in the noncoding sequence surrounding numerous gene sets corresponding to Gene Ontology categories and tissue-specific gene expression clusters, suggesting involvement in transcriptional regulation of those genes. Altogether, our results indicate that many of the sequences bound by these proteins in vitro, including lower affinity DNA sequences, are likely to be functionally important in vivo. This study not only provides an initial analysis of the potential regulatory associations of 104 mouse TFs, but also presents an approach for the functional analysis of TFs from any other metazoan genome as their DNA binding preferences are determined by PBMs or other technologies.
Project description:Gene expression is regulated in part by protein transcription factors that bind target regulatory DNA sequences. Predicting DNA binding sites and affinities from transcription factor sequence or structure is difficult; therefore, experimental data are required to link transcription factors to target sequences. We present a microfluidics-based approach for de novo discovery and quantitative biophysical characterization of DNA target sequences. We validated our technique by measuring sequence preferences for 28 Saccharomyces cerevisiae transcription factors with a variety of DNA-binding domains, including several that have proven difficult to study by other techniques. For each transcription factor, we measured relative binding affinities to oligonucleotides covering all possible 8-bp DNA sequences to create a comprehensive map of sequence preferences; for four transcription factors, we also determined absolute affinities. We expect that these data and future use of this technique will provide information essential for understanding transcription factor specificity, improving identification of regulatory sites and reconstructing regulatory interactions.
Project description:Growth hormone regulates its biological properties via a sequential hormone-induced receptor homodimerization mechanism. Using a mutagenesis-scanning analysis of 81 single and 32 pairwise double mutations, we show that the hormone's two spatially distal receptor binding sites (Site1 and Site2) are allosterically coupled. These allosteric effects are focused among a relatively few residues centered around the interaction between Asp-116 of the hormone and Trp-169 of the receptor in Site2. A rearrangement of this interaction triggered by mutations in Site1 produces both a major conformation and energetic reorganization of Site2, surprisingly without a reduction in overall binding affinity. Additionally, the data suggest a change in the conformational dynamics of several groups in Site2 that appear to be important in defining the Site2 interaction. Changes in binding energy of the affected Site2 residues usually range in magnitude from 3- to 60-fold, but in one case are as large as 10(4).
Project description:To represent the sequence specificity of transcription factors, the position weight matrix (PWM) is widely used. In most cases, each element is defined as a log likelihood ratio of a base appearing at a certain position, which is estimated from a finite number of known binding sites. To avoid bias due to this small sample size, a certain numeric value, called a pseudocount, is usually allocated for each position, and its fraction according to the background base composition is added to each element. So far, there has been no consensus on the optimal pseudocount value. In this study, we simulated the sampling process by artificially generating binding sites based on observed nucleotide frequencies in a public PWM database, and then the generated matrix with an added pseudocount value was compared to the original frequency matrix using various measures. Although the results were somewhat different between measures, in many cases, we could find an optimal pseudocount value for each matrix. These optimal values are independent of the sample size and are clearly correlated with the entropy of the original matrices, meaning that larger pseudocount vales are preferable for less conserved binding sites. As a simple representative, we suggest the value of 0.8 for practical uses.
Project description:BackgroundHigh-throughput in vivo protein-DNA interaction experiments are currently widely used in gene regulation studies. Hitherto, comprehensive data analysis remains a challenge and for that reason most computational methods only consider the top few hundred or thousand strongest protein binding sites whereas weak protein binding sites are completely ignored.ResultsA new biophysical model of protein-DNA interactions, BayesPI2+, was developed to address the above-mentioned challenges. BayesPI2+ can be run in either a serial computation model or a parallel ensemble learning framework. BayesPI2+ allowed us to analyze all binding sites of the transcription factors, including weak binding that cannot be analyzed by other models. It is evaluated in both synthetic and real in vivo protein-DNA binding experiments. Analysing ESR1 and SPIB in breast carcinoma and activated B cell-like diffuse large B-cell lymphoma cell lines, respectively, revealed that the concerted binding to high and low affinity sites correlates best with gene expression.ConclusionsBayesPI2+ allows us to analyze transcription factor binding on a larger scale than hitherto achieved. By this analysis, we were able to demonstrate that genes are regulated by concerted binding to high and low affinity binding sites. The program and output results are publicly available at: http://folk.uio.no/junbaiw/BayesPI2Plus.