Project description:Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing is a powerful technique that characterizes the genome-wide DNA-binding profile of a protein of interest. The general ChIP-seq workflow has been applied widely to many sample types and target proteins, but sample-specific optimization of various steps is necessary to achieve high-quality data. This protocol is specifically optimized for cultured human embryonic stem cells (hESCs), including steps to check sample quality and non-specific enrichment of "hyper-ChIPable" regions prior to sequencing. For complete details on the use and execution of this protocol, please refer to Gunne-Braden et al. (2020).
Project description:MotivationIn chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.ResultsWe propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.Availability and implementationAvailable at http://www.stat.wisc.edu/∼qizhang/Contact: qizhang@stat.wisc.edu or keles@stat.wisc.eduSupplementary informationSupplementary data are available at Bioinformatics online.
Project description:A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.
Project description:The estrogen receptor alpha (ESR1) is an important gene transcriptional regulator, known to mediate the effects of estrogen. Canonically, ESR1 is activated by its ligand estrogen. However, the role of unliganded ESR1 in transcriptional regulation has been gaining attention. We have recently shown that ligand-free ESR1 is a key regulator of several cytochrome P450 (CYP) genes in the liver, however ligand-free ESR1 has not been characterized genome-wide in the human liver. To address this, ESR1 ChIP-Seq was conducted in human liver samples and in hepatocytes with or without 17beta-estradiol (E2) treatment. We identified both ligand-dependent and ligand-independent binding sites throughout the genome. These two ESR1 binding categories showed different genomic localization, pathway enrichment, and cofactor colocalization, indicating different ESR1 regulatory function depending on ligand availability. By analyzing existing ESR1 data from additional human cell lines, we uncovered a potential ligand-independent ESR1 activity, namely its co-enrichment with the zinc finger protein 143 (ZNF143). Furthermore, we identified ESR1 binding sites near many gene loci related to drug therapy, including the CYPs. Overall, this study shows distinct ligand-free and ligand-bound ESR1 chromatin binding profiles in the liver and suggests the potential broad influence of ESR1 in drug metabolism and drug therapy.
Project description:BackgroundTo facilitate deciphering underlying transcriptional regulatory circuits in mouse embryonic stem (ES) cells, recent ChIP-seq data provided genome-wide binding locations of several key transcription factors (TFs); meanwhile, existing efforts profiled gene expression in ES cells and in their early differentiated state. It has been shown that the gene expression profiles are correlated with the binding of these TFs. However, it remains unclear whether other TFs, referred to as cofactors, participate the gene regulation by collaborating with the ChIP-seq TFs.ResultsBased on our analyses of the ES gene expression profiles and binding sites of potential cofactors in vicinity of the ChIP-seq TF binding locations, we identified a list of co-binding features that show significantly different characteristics between different gene expression patterns (activated or repressed gene expression in ES cells) at a false discovery rate of 10%. Gene classification with a subset of the identified features achieved up to 20% improvement over classification only based on the ChIP-seq TFs. More than 1/3 of reasoned regulatory roles of cofactor candidates involved in these features are supported by existing literatures. Finally, the predicted target genes of the majority candidates present expected expression change in another independent data set, which serves as a supplementary validation of these candidates.ConclusionsOur results revealed a list of combinatorial genomic features that are significantly associated with gene expression in ES cells, suggesting potential cofactors of the ChIP-seq TFs for gene regulation.