Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

Quantitative modeling of transcription factor binding specificities using DNA shape


ABSTRACT: Accurate predictions of the DNA binding specificities of transcription factors (TFs) are necessary for understanding gene regulatory mechanisms. Traditionally, predictive models are built based on nucleotide sequence features. Here, we employed three- dimensional DNA shape information obtained on a high-throughput basis to integrate intuitive DNA structural features into the modeling of TF binding specificities using support vector regression. We performed quantitative predictions of DNA binding specificities, using the DREAM5 dataset for 65 mouse TFs and genomic-context protein binding microarray data for three human basic helix-loop-helix TFs. DNA shape-augmented models compared favorably with sequence-based models for these predictions. Although both k-mer and DNA shape features encoded the interdependencies between nucleotide positions of the binding site, using DNA shape features reduced the dimensionality of the feature space compared to k-mer use. Finally, analyzing the weights of DNA shape-augmented models uncovered TF family- specific structural readout mechanisms that were not obvious from the nucleotide sequence. Three genomic-context protein binding microarray (gcPBM) experiments of human transcription factors were performed. Briefly, the gcPBMs involved binding his-tagged transcription factors c-Myc, Max, and Mad1(Mxd1) to double-stranded 180K Agilent microarrays in order to determine their binding specificity for putative DNA binding sites in native genomic context. Briefly, we represent three categories of 36-bp sequences: 1) bound probes, 2) unbound probes (or negative controls), and 3) test probes. Bound probes corresponded to genomic regions bound in vivo by c-Myc, Max, or Mad2 (ChIP-seq P < 10^(-10) in HeLaS3 or K562 celld (ENCODE)) that contain at least two consecutive 8-mers with universal PBM E-score > 0.4 (Munteanu and Gordan, LNCS 2013). All putative binding sites occur at the same position within the probes on the array. M-bM-^@M-^\UnboundM-bM-^@M-^] probes corresponded to genomic regions with ChIP-seq P < 10^(-10) and a maximum 8-mer E-score < 0.2. We also designed test probes that contain, within constant flanking regions, all nnCACGTGnn 10-mers and 18 nnnCACGTGnnn 12-mers (where n = A, C, G, or T). Each DNA sequence represented on the array is present in 6 replicate spots. We report the gcPBM signal intensity for each spot. The PBM protocol is described in Berger et al., Nature Biotechnology 2006 (PMID 16998473).

ORGANISM(S): Homo sapiens

SUBMITTER: Raluca Gordan 

PROVIDER: E-GEOD-59845 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

altmetric image

Publications

Protein-DNA binding in the absence of specific base-pair recognition.

Afek Ariel A   Schipper Joshua L JL   Horton John J   Gordân Raluca R   Lukatsky David B DB  

Proceedings of the National Academy of Sciences of the United States of America 20141013 48


Until now, it has been reasonably assumed that specific base-pair recognition is the only mechanism controlling the specificity of transcription factor (TF)-DNA binding. Contrary to this assumption, here we show that nonspecific DNA sequences possessing certain repeat symmetries, when present outside of specific TF binding sites (TFBSs), statistically control TF-DNA binding preferences. We used high-throughput protein-DNA binding assays to measure the binding levels and free energies of binding  ...[more]

Similar Datasets

2014-10-01 | E-GEOD-61920 | biostudies-arrayexpress
2014-09-22 | E-GEOD-58570 | biostudies-arrayexpress
2013-03-29 | E-GEOD-44436 | biostudies-arrayexpress
2013-05-17 | E-GEOD-47026 | biostudies-arrayexpress
2013-03-29 | E-GEOD-44437 | biostudies-arrayexpress
2011-12-10 | E-GEOD-34306 | biostudies-arrayexpress
2015-01-21 | E-GEOD-65073 | biostudies-arrayexpress
2015-03-01 | E-GEOD-60200 | biostudies-arrayexpress
2008-12-26 | E-GEOD-12349 | biostudies-arrayexpress
2010-06-26 | E-GEOD-11239 | biostudies-arrayexpress