Variation in homeodomain DNA-binding revealed by high-resolution analysis of sequence preferences
Ontology highlight
ABSTRACT: Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity, and showing for the first time that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success. Keywords: Mouse homeodomain protein binding microarrays 178 Protein binding microarray (PBM) experiments of mouse homeodomains were performed, with 10 proteins done in replicate. Briefly, the PBMs involved binding GST-tagged mouse homeodomains to custom-designed, double-stranded 44K Agilent microarrays in order to determine their sequence preferences. The method is described in Berger et al., Nature Biotechnology 2006. A key feature is that the microarrays are composed of de Bruijn sequences that contain each 10-base sequence once and only once, providing an evenly balanced sequence distribution. Individual de Bruijn sequences have different properties, including representation of gapped patterns. The array sequences as well as the primary array data are available via a EULA at http://the_brain.bwh.harvard.edu/pbms/webworks2/. Here we provide the data transformed into median intensities (after normalization and detrending of the original array data) for all 32,896 8-base sequences, Z-scores for these intensities, and E-scores. E-scores are a modified version of AUC, and describe how well each 8-mer ranks the intensities of the spots. In general the E-scores are slightly more reproducible than Z-scores, but contain less information about relative binding affinity. Additional experimental details are found in Berger et al., Nature Biotechnology 2006, Berger et al., Cell 2008, and the accompanying Supplementary information.
ORGANISM(S): Mus musculus
SUBMITTER: Lourdes Pena-Castillo
PROVIDER: E-GEOD-11239 | biostudies-arrayexpress |
REPOSITORIES: biostudies-arrayexpress
ACCESS DATA