Project description:Core transcription regulatory circuitry (CRC) is comprised of a small group of self-regulated transcription factors (TFs) and their interconnected regulatory loops. Studies from embryonic stem cells and other cellular models have revealed the elementary roles of CRCs in transcriptional control of cell identity and cellular fate. Systematic identification and subsequent archiving of CRCs across diverse cell types and tissues are needed to explore both cell/tissue type-specific and disease-associated transcriptional networks. Here, we present a comprehensive and interactive database (dbCoRC, http://dbcorc.cam-su.org) of CRC models which are computationally inferred from mapping of super-enhancer and prediction of TF binding sites. The current version of dbCoRC contains CRC models for 188 human and 50 murine cell lines/tissue samples. In companion with CRC models, this database also provides: (i) super enhancer, typical enhancer, and H3K27ac landscape for individual samples, (ii) putative binding sites of each core TF across the super-enhancer regions within CRC and (iii) expression of each core TF in normal or cancer cells/tissues. The dbCoRC will serve as a valuable resource for the scientific community to explore transcriptional control and regulatory circuitries in biological processes related to, but not limited to lineage specification, tissue homeostasis and tumorigenesis.
Project description:Motivation:Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. Results:We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. Availability and implementation:The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. Contact:milenkov@illinois.edu. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:SummaryCancer hallmarks rely on its specific transcriptional programs, which are dysregulated by multiple mechanisms, including genomic aberrations in the DNA regulatory regions. Genome-wide association studies have shown many variants are found within putative enhancer elements. To provide insights into the regulatory role of enhancer-associated non-coding variants in cancer epigenome, and to facilitate the identification of functional non-coding mutations, we present dbInDel, a database where we have comprehensively analyzed enhancer-associated insertion and deletion variants for both human and murine samples using ChIP-Seq data. Moreover, we provide the identification and visualization of upstream TF binding motifs in InDel-containing enhancers. Downstream target genes are also predicted and analyzed in the context of cancer biology. The dbInDel database promotes the investigation of functional contributions of non-coding variants in cancer epigenome.Availability and implementationThe database, dbInDel, can be accessed from http://enhancer-indel.cam-su.org/.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:MotivationAlthough chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip) is increasingly used to map genome-wide-binding sites of transcription factors (TFs), it still remains difficult to generate a quality ChIPx (i.e. ChIP-seq or ChIP-chip) dataset because of the tremendous amount of effort required to develop effective antibodies and efficient protocols. Moreover, most laboratories are unable to easily obtain ChIPx data for one or more TF(s) in more than a handful of biological contexts. Thus, standard ChIPx analyses primarily focus on analyzing data from one experiment, and the discoveries are restricted to a specific biological context.ResultsWe propose to enrich this existing data analysis paradigm by developing a novel approach, ChIP-PED, which superimposes ChIPx data on large amounts of publicly available human and mouse gene expression data containing a diverse collection of cell types, tissues and disease conditions to discover new biological contexts with potential TF regulatory activities. We demonstrate ChIP-PED using a number of examples, including a novel discovery that MYC, a human TF, plays an important functional role in pediatric Ewing sarcoma cell lines. These examples show that ChIP-PED increases the value of ChIPx data by allowing one to expand the scope of possible discoveries made from a ChIPx experiment.Availabilityhttp://www.biostat.jhsph.edu/~gewu/ChIPPED/