Characterizing protein-DNA binding event subtypes in ChIP-exo data
Ontology highlight
ABSTRACT: Regulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein-protein interactions with other regulators. Each genomic recruitment mechanism may be associated with distinct motifs, and may also result in distinct characteristic patterns in high-resolution protein-DNA binding assays. For example, the ChIP-exo protocol precisely characterizes protein-DNA crosslinking patterns by combining chromatin immunoprecipitation (ChIP) with 5’ to 3’ exonuclease digestion. Since different regulatory complexes will result in different protein-DNA crosslinking signatures, analysis of ChIP-exo sequencing tag patterns should enable detection of multiple protein-DNA binding modes for a given regulatory protein. However, current ChIP-exo analysis methods either treat all binding events as being of a uniform type, or rely on the presence of DNA motifs to cluster binding events into subtypes. To systematically detect multiple protein-DNA interaction modes in a single ChIP-exo experiment, we introduce the ChIP-exo mixture model (ChExMix). ChExMix probabilistically models the genomic locations and subtype membership of protein-DNA binding events using both ChIP-exo tag enrichment patterns and DNA sequence information, thus offering a principled and robust approach to characterizing binding subtypes in ChIP-exo data. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP-exo data. We further demonstrate the unique analysis abilities of ChExMix using a collection of ChIP-exo experiments that profile the binding of key transcription factors in MCF-7 cells. In these data, ChExMix detects cooperative binding interactions between FoxA1, ERalpha, and CTCF, thus demonstrating that ChExMix can effectively stratify ChIP-exo binding events into biologically meaningful subtypes.
ORGANISM(S): Saccharomyces cerevisiae Homo sapiens
Project description:Each protein within a regulatory complex associates with the genome by either binding DNA directly or by forming protein-protein interactions with DNA-bound proteins. In the chromatin immunoprecipitation (ChIP) assay, each protein’s unique mode of genomic association may be reflected by their patterns of formaldehyde-induced crosslinks to the DNA sequences that are in very close proximity. The ChIP-exo protocol precisely delineates protein-DNA crosslinking patterns by combining ChIP with 5' to 3' exonuclease digestion. Within a regulatory complex, the physical distance of a regulatory protein to the DNA affects crosslinking efficiencies. Therefore, the spatial organization of a protein-DNA complex could potentially be inferred by analyzing how crosslinking signatures vary between the subunits of a regulatory complex, and how they remain consistent over a set of coordinately regulated regions. Here, we present a computational framework that aligns ChIP-exo crosslinking patterns from multiple proteins across a set of regulatory regions, and which detects and quantifies protein-DNA crosslinking events within the aligned profiles. Our gapped multiple profile alignment approach does not rely on sequence motif features, but rather operates directly on the multi-protein, strand separated ChIP-exo tag patterns. The output of the alignment approach is a set of composite profiles that represent the crosslinking signatures of the complex across analyzed regulatory regions. We then use a probabilistic mixture model to deconvolve individual crosslinking events within the aligned ChIP-exo profiles, enabling consistent measurements of protein-DNA crosslinking strengths across multiple proteins. Lastly, we apply dimensionality reduction to visualize the relative organization of proteins within the regulatory complex. We demonstrate our approach by applying it to characterize regulatory complex organization in three biological settings. Firstly, we demonstrate that our alignment approach can recover the known organization of regulatory proteins at yeast ribosomal protein genes, without relying on any DNA sequence features. Secondly, we apply our gapped alignment and crosslinking quantification approaches to a novel set of ChIP-exo data to characterize the spatial organization of Pol III transcriptional machinery assembly at yeast tRNA genes. Finally, we demonstrate that our approach can be used to quantify changes in protein-DNA complex organization when applied to ChIP-nexus data from Drosophila Pol II transcriptional components in two experimental conditions. Our results suggest that principled analyses of ChIP-exo crosslinking patterns enable inference of spatial organization within protein-DNA complexes.
Project description:Chromatin immunoprecipitation (ChIP) and its derivatives are the main techniques used to determine transcription factor binding sites. However, conventional ChIP with sequencing (ChIP-seq) has problems with poor resolution and newer techniques require significant experimental alterations and complex bioinformatics. Here we build upon our high-resolution crosslinking ChIP-seq (X-ChIP-seq) method and compare it to existing methodologies. By using micrococcal nuclease, which has both endo- and exo-nuclease activity to fragment the chromatin and thereby generate precise protein-DNA footprints, high-resolution X-ChIP-seq achieves single base pair resolution of transcription factor binding. A significant advantage of this protocol is the minimal alteration to the conventional ChIP-seq workflow and simple bioinformatic processing. Using High-resolution X-ChIP-seq we determined the genome-wide binding profile of various DNA binding proteins.
Project description:Genome-wide mapping of transcription factor binding is generally performed by chemical protein-DNA crosslinking, followed by chromatin immunoprecipitation and deep sequencing (ChIP-seq). Here we present the ChIP-seq technique based on photochemical crosslinking of protein-DNA interactions by high-intensity ultraviolet (UV) laser irradiation in living mammalian cells (UV-ChIP-seq). UV laser irradiation induces efficient and instant formation of covalent “zero-length” crosslinks exclusively between nucleic acids and proteins that are in immediate contact, thus resulting in a “snapshot” of direct protein-DNA interactions in their natural environment. We applied UV-ChIP-seq for genome-wide profiling of the sequence-specific transcriptional repressor B-cell lymphoma 6 (BCL6) in human diffuse large B-cell lymphoma (DLBCL) cells. Our approach resulted in sensitive and precise protein-DNA binding profiles, highly enriched in canonical BCL6 DNA sequence motifs. UV-ChIP-seq also revealed numerous previously undetectable BCL6 binding sites, particularly in more condensed, inaccessible areas of chromatin.
Project description:Chromatin immunoprecipitation (ChIP) and its derivatives are the main techniques used to determine transcription factor binding sites. However, conventional ChIP with sequencing (ChIP-seq) has problems with poor resolution and newer techniques require significant experimental alterations and complex bioinformatics. Here we build upon our high-resolution crosslinking ChIP-seq (X-ChIP-seq) method and compare it to existing methodologies. By using micrococcal nuclease, which has both endo- and exo-nuclease activity to fragment the chromatin and thereby generate precise protein-DNA footprints, high-resolution X-ChIP-seq achieves single base pair resolution of transcription factor binding. A significant advantage of this protocol is the minimal alteration to the conventional ChIP-seq workflow and simple bioinformatic processing.
Project description:Although DNA motifs recognized by the transcription factors (TFs) have been determined, challenges remain in probing in vivo architecture of TF-DNA complexes on a genome-wide scale. Here, we show in vivo architecture of Escherichia coli arginine repressor (ArgR)-DNA complexes using chromatin immunoprecipitation coupled with sequencing (ChIP-exo). The identified 62 ArgR-binding loci were classified into three groups, comprised of single, double, and triple peak-pairs, respectively. Each peak-pair has unique 93 bp-long (±2 bp) ArgR-binding sequence containing two ARG boxes (39 bp) and residual sequence. Moreover, the peak-pairs provided the three ArgR-binding modes defined by the position of the two ARG boxes, indicating that the formation of DNA bending apparently centered between the pair of ARG boxes facilitates the non-specific contacts between ArgR subunits and the residual sequences. Thus, our data postulate the in vivo architecture of ArgR-DNA complexes to understand its transcription regulatory mechanism. ChIP-exo profiles of ArgR (+Arginine) and ArgR (-Arginine) were generated by deep sequencing in duplicates using Illumina MiSeq.
Project description:Erythroid development and differentiation from multiprogenitor cells to red blood cells requires precise transcriptional regulation. Key erythroid transcription factors, GATA1 and TAL1, co-operate, along with other proteins, to regulate many aspects of this process. How GATA1 and TAL1 are positionally organized with respect to each other and their cognate DNA binding site across the mouse genome remains unclear. We applied high resolution ChIP-exo to GATA1 and TAL1 to study their positional organization across the mouse genome during GATA1-dependent maturation. Two complementary methods, MultiGPS and peak-pairing, were used to determine high confidence binding locations by ChIP-exo. We identified ~10,000 GATA1 and ~15,000 TAL1 locations, which were essentially confirmed by ChIP-seq. Of these, ~4,000 locations were bound by both GATA1 and TAL1. About three-quarters of these were tightly linked (<40 bp away) to a partial E-box located 7-8 bp upstream of a WGATAA motif. Both TAL1 and GATA1 generated distinct characteristic ChIP-exo peaks around WGATAA motifs, that reflect on their positional arrangement within a complex. We show that TAL1 and GATA1 form a precisely organized complex at a compound motif consisting of a TG 7-8 bp upstream of a WGATAA motif across thousands of genomic locations. Genome wide analysis of GATA1 and TAL1 in G1E and G1E-ER4 cells using ChIP-exo experiments
Project description:In vivo crosslinking studies suggest that the Drosophila transcription factor Bicoid (Bcd) binds to several thousand sites during early embryogenesis, but it is not clear how many of these binding events are functionally important. In contrast, reporter gene studies have identified more than 60 Bcd-dependent enhancers, all of which contain clusters of the consensus binding sequence TAATCC. These studies also identified clusters of TAATCC motifs (inactive fragments) that failed to drive Bcd-dependent activation. In general, active fragments showed higher levels of Bcd-binding in vivo, and were enriched in predicted binding sites for the ubiquitous maternal protein Zelda (Zld). Here we test the role of Zld in Bcd-mediated binding and transcription. Removal of Zld function and mutations in Zld sites caused significant reductions in Bcd-binding to known enhancers, and variable effects on the activation and spatial positioning of Bcd-dependent expression patterns. Genome-wide binding experiments in Zld mutants showed variable effects on Bcd-binding peaks, ranging from strong reductions to significantly enhanced levels of binding. Increases in Bcd binding caused the precocious Bcd-dependent activation of genes that are normally not expressed in early embryos, suggesting that Zld controls the genome-wide binding profile of Bcd at the qualitative level, and is critical for selecting target genes for activation in the early embryo. These results underscore the importance of combinatorial binding in enhancer function, and provide data that will help predict regulatory activities based on DNA sequence. 1-3 hour wild type and zelda mutant embryos were examined for Bicoid binding using Illumina HiSeq 2000. The Bcd antibody is a rabbit polyclonal antiserum raised against full length Drosophila Bcd protein (Ochoa-Espinosa et al. 2009 PNAS). The antiserum was further purified using the Protein A Antibody Purification Kit (Sigma) for ChIP-seq.
Project description:Genome-wide chromatin-immunoprecipitation (ChIP-chip) detects binding of transcriptional regulators to DNA in vivo at low resolution. Motif discovery algorithms can be used to discover sequence patterns in the bound regions that may be recognized by the immunoprecipitated protein. However, the discovered motifs often do not agree with the binding specificity of the protein, when it is known. RESULTS: We present a powerful approach to analyzing ChIP-chip data, called THEME, that tests hypotheses concerning the sequence specificity of a protein. Hypotheses are refined using constrained local optimization. Cross-validation provides a principled standard for selecting the optimal weighting of the hypothesis and the ChIP-chip data and for choosing the best refined hypothesis. We demonstrate how to derive hypotheses for proteins from 36 domain families. Using THEME together with these hypotheses, we analyze ChIP-chip datasets for 14 human and mouse proteins. In all the cases the identified motifs are consistent with the published data with regard to the binding specificity of the proteins.