Dataset Information

Current DNA motif models can lead to incorrect hypotheses about the genomic recruitment of transcription factors

ABSTRACT: The E2F family of transcription factors is typically described as binding the family consensus sequence TTTSSCGC, were S is G or C. Analysis of ChIP-seq experiments, however, reveals that this consensus sequence is found in only 10% of ChIP-seq peaks, suggesting that the mechanism for E2F sequence recognition cannot be explained using previous assumptions. In order to better understand E2F sequence specificity, we performed high-throughput Universal Protein Binding Microarray experiments to obtain the relative binding affinity for every possible 8-mer, as well a large number of bound and unbound probes intheir native genomic sequence context. Our results show that while the consensus sequence is bound with relatively high affinity, numerous other 8-mers, many distinctly different from the consensus motif, are bound with similar or greater affinity. These data suggest that the mechanism for E2F sequence specificity is likely complex, and cannot readily be explained through a simple consensus sequence. Because of this, complex regression models were created using the bound and unbound probe binding affinities, and were able to predict binding in vivo, where the consensus sequence and varoius E2F PWMs were not.

ORGANISM(S): synthetic construct Homo sapiens

PROVIDER: GSE61854 | GEO | 2014/11/04

SECONDARY ACCESSION(S): PRJNA262558

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:In this work, we used our recently developed method Spec-seq to characterize the binding specificity of Glucocorticoid receptor in vitro. This GR Spec-seq experiment has been run twice separately (Dec 2014 and Mar 2015) with different library compositions. The basic workflow is the same as our previous work for lac repressor published in Genetics 198.3 (2014): 1329-1343. Recombinant human GR protein was used to facilitate in vitro DNA-binding and separation experiments. Bound and Unbound DNA fragments were separated in EMSA gels, purified, barcoded for further Illumina sequencing. An initial experiment was performed using the putative consensus sequence: AGAACA GGG TGTTCT; randomized library 1: AGAACN NSN NGTTCT [diversity =512]; randomized library 2: AGAANN GGG NNNTCT [diversity = 1024]; randomized library 3: AGAACA GGG TGNNNN [diversity =256]; randomized library 4: AGAACA GGGC NNNTCT [diversity = 64]. The initial total library diversity was ~1853 with a library composition of 10% positive control sequence + randomized libraries 1-4 + 5% negative control sequence. A 5th library containing 2304 sequences based on DDNACW KKN KGTTCT, where D="not C", N="any base", W="A or T" and K="G or T" was subsequently prepared and analyzed using similar ratios of control sequences. Binding conditions for EMSA were 100ng FAM-labelled dsDNA+ 0/0.5/1/2/4uM GR DBD protein for each lane, 1X NEB buffer 4. GR DBD was prepared as previously described. The EMSA was performed using a 9% 33:1 acrylamide gel and TB buffer, and was run at 200V for 30mins @ 0 degrees C. The 2uM protein lane used for final sequencing. Bound/unbound fractions resulting from EMSA of these libraries and conditions were used to generate PWMs as described. The GR PWM that was generated through this analysis was used to define relative binding energies using the patser program (35), which can be accessed online at (http://stormo.wustl.edu/consensus/cgi-bin/Server/Interface/patser.cgi ). Derived binding affinities are proportional to the inverse of the natural log of the calculated energies.

Project description:Recent genome-scale ChIP-chip studies of transcription factors have shown that a low percentage of experimentally determined binding sites contain the consensus motif for the immunoprecipitated factor. In most cases, differences between in vivo target sites that contain or lack a consensus motif have not been explored. We have previously shown that most sites to which E2F family members are bound in vivo do not contain E2F consensus motifs. The main purpose of this study was to develop an understanding of how E2F binding specificity is achieved in vivo. In particular, we have addressed how E2F family members are recruited to core promoter regions that lack a consensus motif and are excluded from other regions that contain a consensus motif. Using promoter and ENCODE arrays, we have shown that the predominant factors specifying whether E2F is recruited to an in vivo binding site are a) the site must be in a core promoter and b) the promoter region must be utilized as a promoter by the transcriptional machinery in that particular cell type. We have tested three models for recruitment of E2F to core promoters lacking a consensus site, including a) indirect recruitment, b) looping to the core promoter mediated by an E2F bound to a distal consensus motif, and c) assisted binding of E2F to a site that weakly resembles an E2F consensus motif within the core promoter. To test these models, we developed a new in vivo assay, termed eChIP, which allows analysis of transcription factor binding to isolated promoter fragments. Our findings suggest that in vivo a) the presence of a consensus motif is not sufficient to recruit E2Fs, b) E2Fs can bind to isolated regions that lack a consensus motif, and c) binding can require regions other than the best match to the E2F PWM in the core promoter. Keywords: E2F, ChIP-chip, transcription factor binding, consensus motifs 37 ChIP-chip arrays (of these, 14 array sets are biological duplicates). 22 samples are included in this series, the rest can be found in supplementary info to the following papers: Xu 2007, Jin 2006, Komashko 2008

Project description:Motivation: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix (PWM) model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. Results: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max, and Mad2) in their native genomic context. These high-throughput, quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar PWMs, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step towards better sequence-based models of individual TF-DNA binding specificity. Four protein binding microarray (PBM) experiments of human transcription factors were performed. Briefly, the PBMs involved binding GST-tagged transcription factors c-Myc, Max, and Mad2(Mxi1) to double-stranded 180K Agilent microarrays in order to determine their binding specificity for putative DNA binding sites in native genomic context. Briefly, we represent three categories of 36-bp sequences: 1) bound probes, 2) unbound probes (or negative controls), and 3) test probes. Bound probes corresponded to genomic regions bound in vivo by c-Myc, Max, or Mad2 (ChIP-seq P < 10^(-10) in HeLaS3 or K562 celld (ENCODE)) that contain at least two consecutive 8-mers with universal PBM E-score > 0.4 (Munteanu and Gordan, LNCS 2013). All putative binding sites occurr at the same position within the probes on the array. M-bM-^@M-^\UnboundM-bM-^@M-^] probes corresponded to genomic regions with ChIP-seq P < 10^(-10) and a maximum 8-mer E-score < 0.2. We also designed test probes that contain, within constant flanking regions, all nnCACGTGnn 10-mers and 18 nnnCACGTGnnn 12-mers (where n = A, C, G, or T). Each DNA sequence represented on the array is present in 6 replicate spots. We report the PBM signal intensity for each spot. The PBM protocol is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473).

Dataset Information

Current DNA motif models can lead to incorrect hypotheses about the genomic recruitment of transcription factors

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets