Project description:Accurate predictions of the DNA binding specificities of transcription factors (TFs) are necessary for understanding gene regulatory mechanisms. Traditionally, predictive models are built based on nucleotide sequence features. Here, we employed three- dimensional DNA shape information obtained on a high-throughput basis to integrate intuitive DNA structural features into the modeling of TF binding specificities using support vector regression. We performed quantitative predictions of DNA binding specificities, using the DREAM5 dataset for 65 mouse TFs and genomic-context protein binding microarray data for three human basic helix-loop-helix TFs. DNA shape-augmented models compared favorably with sequence-based models for these predictions. Although both k-mer and DNA shape features encoded the interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space compared to k-mer use. Finally, analyzing the weights of DNA shape-augmented models uncovered TF family- specific structural readout mechanisms that were not obvious from the nucleotide sequence.
Project description:Accurate predictions of the DNA binding specificities of transcription factors (TFs) are necessary for understanding gene regulatory mechanisms. Traditionally, predictive models are built based on nucleotide sequence features. Here, we employed three- dimensional DNA shape information obtained on a high-throughput basis to integrate intuitive DNA structural features into the modeling of TF binding specificities using support vector regression. We performed quantitative predictions of DNA binding specificities, using the DREAM5 dataset for 65 mouse TFs and genomic-context protein binding microarray data for three human basic helix-loop-helix TFs. DNA shape-augmented models compared favorably with sequence-based models for these predictions. Although both k-mer and DNA shape features encoded the interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space compared to k-mer use. Finally, analyzing the weights of DNA shape-augmented models uncovered TF family- specific structural readout mechanisms that were not obvious from the nucleotide sequence. Three genomic-context protein binding microarray (gcPBM) experiments of human transcription factors were performed. Briefly, the gcPBMs involved binding his-tagged transcription factors c-Myc, Max, and Mad1(Mxd1) to double-stranded 180K Agilent microarrays in order to determine their binding specificity for putative DNA binding sites in native genomic context. Briefly, we represent three categories of 36-bp sequences: 1) bound probes, 2) unbound probes (or negative controls), and 3) test probes. Bound probes corresponded to genomic regions bound in vivo by c-Myc, Max, or Mad2 (ChIP-seq P < 10^(-10) in HeLaS3 or K562 celld (ENCODE)) that contain at least two consecutive 8-mers with universal PBM E-score > 0.4 (Munteanu and Gordan, LNCS 2013). All putative binding sites occur at the same position within the probes on the array. M-bM-^@M-^\UnboundM-bM-^@M-^] probes corresponded to genomic regions with ChIP-seq P < 10^(-10) and a maximum 8-mer E-score < 0.2. We also designed test probes that contain, within constant flanking regions, all nnCACGTGnn 10-mers and 18 nnnCACGTGnnn 12-mers (where n = A, C, G, or T). Each DNA sequence represented on the array is present in 6 replicate spots. We report the gcPBM signal intensity for each spot. The PBM protocol is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473).
Project description:Transcription factor-DNA interactions and their specificities have been described for many different classes of transcription factor families. However, heterodimeric transcription factor complexes still remain poorly characterised. The basic-Helix-Loop-Helix (bHLH) transcription factor family is one of the largest transcription factor families that typically bind DNA though a degenerate CANNTG elements as heterodimers or homodimers. Here we characterise the DNA binding of the bHLH - Per-Arnt-Sim (PAS) (bHLH-PAS) domain containing transcription factor family using SELEX-high-throughput sequencing coupled with quantitative computational modelling analysis. We show that most dimeric bHLH-PAS transcription factors bind to distinct core NNCGTG response elements but bind over a much larger footprint than previously characterised. Modelled DNA-protein interactions were found to correlate with structural analysis, DNA shape predictions and in vivo transcription factor occupancy.
2022-07-02 | GSE159989 | GEO
Project description:DNA-binding specificities of human transcription factors
| PRJEB3291 | ENA
Project description:DNA-binding specificities of human transcription factors
Project description:Understanding how eukaryotic enhancers are bound and regulated by specific combinations of transcription factors is still a major challenge. To better map transcription factor binding genome-wide at nucleotide resolution in vivo, we have developed a robust ChIP-exo protocol called ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode and single ligation), which utilizes an efficient DNA self-circularization step during library preparation. Application of ChIP-nexus to four proteins—human TBP, Drosophila NFkB, Twist and Max—showed that it outperformed existing ChIP protocols in resolution and specificity, pinpointed relevant binding sites within enhancers containing multiple binding motifs, and allowed for the analysis of in vivo binding specificities. Notably, we show that Max frequently interacted with DNA sequences next to its motif, and that this binding pattern correlated with local DNA-sequence features such as DNA shape. ChIP-nexus will be broadly applicable to the study of in vivo transcription factor binding specificity and its relationship to cis-regulatory changes in humans and model organisms.
Project description:The sequence specificity of DNA-binding proteins is the primary mechanism by which the cell recognizes genomic features. Here, we describe systematic determination of yeast transcription factor DNA-binding specificities. We obtained binding specificities for 112 DNA-binding proteins representing 19 distinct structural classes. One-third of the binding specificities have not been previously reported. Several binding sequences have striking genomic distributions relative to transcription start sites, supporting their biological relevance and suggesting a role in promoter architecture. Among these are Rsc3 binding sequences, containing the core CGCG, which are found preferentially ~100 bp upstream of transcription start sites. Mutation of RSC3 results in a dramatic increase in nucleosome occupancy in hundreds of proximal promoters containing a Rsc3 binding element, but has little impact on promoters lacking Rsc3 binding sequences, indicating that Rsc3 plays a broad role in targeting nucleosome exclusion at yeast promoters. Keywords: Protein binding microarrays, DNA, proteins