Project description:Cell-free DNA (cfDNA) in human plasma provides access to molecular information about the pathological processes in the organs or tumors from which it originates. These DNA fragments are derived from fragmented chromatin in dying cells and retain some of the cell-of-origin histone modifications. In this study, we applied chromatin immunoprecipitation of cell-free nucleosomes carrying active chromatin modifications followed by sequencing (cfChIP-seq) to 268 human samples. In healthy donors, we identified bone marrow megakaryocytes, but not erythroblasts, as major contributors to the cfDNA pool. In patients with a range of liver diseases, we showed that we can identify pathology-related changes in hepatocyte transcriptional programs. In patients with metastatic colorectal carcinoma, we detected clinically relevant and patient-specific information, including transcriptionally active human epidermal growth factor receptor 2 (HER2) amplifications. Altogether, cfChIP-seq, using low sequencing depth, provides systemic and genome-wide information and can inform diagnosis and facilitate interrogation of physiological and pathological processes using blood samples.
Project description:Cell-free DNA in human plasma provides access to molecular information about the pathological processes in the organs or tumors from which it originates. These DNA fragments are derived from fragmented chromatin in dying cells, and retain some of the cell of origin histone modifications. Here, we apply chromatin immunoprecipitation of cell-free nucleosomes carrying active chromatin modifications followed by sequencing (cfChIP-seq) to 268 human samples. In healthy donors, we identified bone marrow megakaryocytes, but not erythroblasts, as major contributors to the cfDNA pool. In patients with a range of liver diseases, we show that we can identify pathology-related changes in hepatocyte transcriptional programs. In metastatic colorectal carcinoma patients, we detected clinically relevant, and patient-specific information, including transcriptionally active HER2 amplifications. Altogether, cfChIP-seq using low sequencing depth, provides systemic and genome-wide information, can inform diagnosis, and facilitate interrogation of physiological and pathological processes using blood samples.
Project description:BackgroundChIPx (i.e., ChIP-seq and ChIP-chip) is increasingly used to map genome-wide transcription factor (TF) binding sites. A single ChIPx experiment can identify thousands of TF bound genes, but typically only a fraction of these genes are functional targets that respond transcriptionally to perturbations of TF expression. To identify promising functional target genes for follow-up studies, researchers usually collect gene expression data from TF perturbation experiments to determine which of the TF targets respond transcriptionally to binding. Unfortunately, approximately 40% of ChIPx studies do not have accompanying gene expression data from TF perturbation experiments. For these studies, genes are often prioritized solely based on the binding strengths of ChIPx signals in order to choose follow-up candidates. ChIPXpress is a novel method that improves upon this ChIPx-only ranking approach by integrating ChIPx data with large amounts of Publicly available gene Expression Data (PED).ResultsWe demonstrate that PED does contain useful information to identify functional TF target genes despite its inherent heterogeneity. A truncated absolute correlation measure is developed to better capture the regulatory relationships between TFs and their target genes in PED. By integrating the information from ChIPx and PED, ChIPXpress can significantly increase the chance of finding functional target genes responsive to TF perturbation among the top ranked genes. ChIPXpress is implemented as an easy-to-use R/Bioconductor package. We evaluate ChIPXpress using 10 different ChIPx datasets in mouse and human and find that ChIPXpress rankings are more accurate than rankings based solely on ChIPx data and may result in substantial improvement in prediction accuracy, irrespective of which peak calling algorithm is used to analyze the ChIPx data.ConclusionsChIPXpress provides a new tool to better prioritize TF bound genes from ChIPx experiments for follow-up studies when investigators do not have their own gene expression data. It demonstrates that the regulatory information from PED can be used to boost ChIPx data analyses. It also represents an important step towards more fully utilizing the valuable, but highly heterogeneous data contained in public gene expression databases.
Project description:Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell's expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.
Project description:Cultured cell lines are the workhorse of cancer research, but the extent to which they recapitulate the heterogeneity observed among malignant cells in tumors is unclear. Here we used multiplexed single-cell RNA-seq to profile 198 cancer cell lines from 22 cancer types. We identified 12 expression programs that are recurrently heterogeneous within multiple cancer cell lines. These programs are associated with diverse biological processes, including cell cycle, senescence, stress and interferon responses, epithelial-mesenchymal transition and protein metabolism. Most of these programs recapitulate those recently identified as heterogeneous within human tumors. We prioritized specific cell lines as models of cellular heterogeneity and used them to study subpopulations of senescence-related cells, demonstrating their dynamics, regulation and unique drug sensitivities, which were predictive of clinical response. Our work describes the landscape of heterogeneity within diverse cancer cell lines and identifies recurrent patterns of heterogeneity that are shared between tumors and specific cell lines.
Project description:The estrogen receptor alpha (ESR1) is an important gene transcriptional regulator, known to mediate the effects of estrogen. Canonically, ESR1 is activated by its ligand estrogen. However, the role of unliganded ESR1 in transcriptional regulation has been gaining attention. We have recently shown that ligand-free ESR1 is a key regulator of several cytochrome P450 (CYP) genes in the liver, however ligand-free ESR1 has not been characterized genome-wide in the human liver. To address this, ESR1 ChIP-Seq was conducted in human liver samples and in hepatocytes with or without 17beta-estradiol (E2) treatment. We identified both ligand-dependent and ligand-independent binding sites throughout the genome. These two ESR1 binding categories showed different genomic localization, pathway enrichment, and cofactor colocalization, indicating different ESR1 regulatory function depending on ligand availability. By analyzing existing ESR1 data from additional human cell lines, we uncovered a potential ligand-independent ESR1 activity, namely its co-enrichment with the zinc finger protein 143 (ZNF143). Furthermore, we identified ESR1 binding sites near many gene loci related to drug therapy, including the CYPs. Overall, this study shows distinct ligand-free and ligand-bound ESR1 chromatin binding profiles in the liver and suggests the potential broad influence of ESR1 in drug metabolism and drug therapy.
Project description:T cell anergy is one of the mechanisms contributing to peripheral tolerance, particularly in the context of progressively growing tumors and in tolerogenic treatments promoting allograft acceptance. We recently reported that early growth response gene 2 (Egr2) is a critical transcription factor for the induction of anergy in vitro and in vivo, which was identified based on its ability to regulate the expression of inhibitory signaling molecules diacylglycerol kinase (DGK)-? and -?. We reasoned that other transcriptional targets of Egr2 might encode additional factors important for T cell anergy and immune regulation. Thus, we conducted two sets of genome-wide screens: gene expression profiling of wild type versus Egr2-deleted T cells treated under anergizing conditions, and a ChIP-Seq analysis to identify genes that bind Egr2 in anergic cells. Merging of these data sets revealed 49 targets that are directly regulated by Egr2. Among these are inhibitory signaling molecules previously reported to contribute to T cell anergy, but unexpectedly, also cell surface molecules and secreted factors, including lymphocyte-activation gene 3 (Lag3), Class-I-MHC-restricted T cell associated molecule (Crtam), Semaphorin 7A (Sema7A), and chemokine CCL1. These observations suggest that anergic T cells might not simply be functionally inert, and may have additional functional properties oriented towards other cellular components of the immune system.
Project description:Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-Seq) is a powerful technology to profile the location of proteins of interest on a whole-genome scale. To identify the enrichment location of proteins, many programs and algorithms have been proposed. However, none of the commonly used peak calling programs could accurately explain the binding features of target proteins detected by ChIP-Seq. Here, publicly available data on 12 histone modifications, including H3K4ac/me1/me2/me3, H3K9ac/me3, H3K27ac/me3, H3K36me3, H3K56ac, and H3K79me1/me2, generated from a human embryonic stem cell line (H1), were profiled with five peak callers (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). The performance of the peak calling programs was compared in terms of reproducibility between replicates, examination of enriched regions to variable sequencing depths, the specificity-to-noise signal, and sensitivity of peak prediction. There were no major differences among peak callers when analyzing point source histone modifications. The peak calling results from histone modifications with low fidelity, such as H3K4ac, H3K56ac, and H3K79me1/me2, showed low performance in all parameters, which indicates that their peak positions might not be located accurately. Our comparative results could provide a helpful guide to choose a suitable peak calling program for specific histone modifications.
Project description:Gene set enrichment testing can enhance the biological interpretation of ChIP-seq data. Here, we develop a method, ChIP-Enrich, for this analysis which empirically adjusts for gene locus length (the length of the gene body and its surrounding non-coding sequence). Adjustment for gene locus length is necessary because it is often positively associated with the presence of one or more peaks and because many biologically defined gene sets have an excess of genes with longer or shorter gene locus lengths. Unlike alternative methods, ChIP-Enrich can account for the wide range of gene locus length-to-peak presence relationships (observed in ENCODE ChIP-seq data sets). We show that ChIP-Enrich has a well-calibrated type I error rate using permuted ENCODE ChIP-seq data sets; in contrast, two commonly used gene set enrichment methods, Fisher's exact test and the binomial test implemented in Genomic Regions Enrichment of Annotations Tool (GREAT), can have highly inflated type I error rates and biases in ranking. We identify DNA-binding proteins, including CTCF, JunD and glucocorticoid receptor α (GRα), that show different enrichment patterns for peaks closer to versus further from transcription start sites. We also identify known and potential new biological functions of GRα. ChIP-Enrich is available as a web interface (http://chip-enrich.med.umich.edu) and Bioconductor package.
Project description:Recent study has identified the cis-regulatory elements in the mouse genome as well as their genomic localizations. Recent discoveries have shown the enrichment of H3 lysine 4 trimethylation (H3K4me3) binding as an active promoter and the presence of H3 lysine 4 monomethylation (H3K4me1) outside promoter regions as a mark for an enhancer. In this work, we further identified highly expressed genes by H3K4me3 mark or by both H3K4me3 and H3K4me1 marks in mouse liver using ChIP-Seq and RNA-Seq. We found that in mice, the liver carries embryonic stem cell-related functions while the embryonic stem cell also carries liver-related functions. We also identified novel genes in RNA-Seq experiments for mouse liver and for mouse embryonic stem cells. These genes are not currently in the Ensemble gene database at NCBI.