Project description:Epigenetic machinery contributes to gene regulation in eukaryotic species. However, the machinery including more than 600 epigenetic regulator (ER) genes responsible for reading, writing, and erasing histone modifications and DNA modifications remains largely uncharacterized across species. We compile a comprehensive list of ERs based on an evolutionary analysis across 23 species, which is the most comprehensive ER list in various species until recently. We further perform comparative transcriptomic analyses across different tissues in humans, mice, as well as other amniote species. We observe a consistent tissue-of-origin expression specificity pattern of duplicated ER genes across species and suggest links between expression specificity and ER gene evolution as well as ER function. Additional analyses further suggest that ER duplication can generate tissue-specific ER genes with the same epigenetic substrates, which may be closely related to their regulatory specificity in tissue development. Our work can serve as a foundation to better comprehend the tissue-specific expression patterns of ER genes from an evolutionary perspective and also the functional implications of ERs in tissue-specific epigenetic regulation.
Project description:The epigenetic landscape varies greatly among cell types. Although a variety of writers, readers, and erasers of epigenetic features are known, we have little information about the underlying regulatory systems controlling the establishment and maintenance of these features. Here, we have explored how natural genetic variation affects the epigenome in mice. Studying levels of H3K4me3, a histone modification at sites such as promoters, enhancers, and recombination hotspots, we found tissue-specific trans-regulation of H3K4me3 levels in four highly diverse cell types: male germ cells, embryonic stem cells, hepatocytes, and cardiomyocytes. To identify the genetic loci involved, we measured H3K4me3 levels in male germ cells in a mapping population of 59 BXD recombinant inbred lines. We found extensive trans-regulation of H3K4me3 peaks, including six major histone quantitative trait loci (QTL). These chromatin regulatory loci act dominantly to suppress H3K4me3, which at hotspots reduces the likelihood of subsequent DNA double-strand breaks. QTL locations do not correspond with genes encoding enzymes known to metabolize chromatin features. Instead their locations match clusters of zinc finger genes, making these possible candidates that explain the dominant suppression of H3K4me3. Collectively, these data describe an extensive, set of chromatin regulatory loci that control the epigenetic landscape.
Project description:Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that the regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue network. However, they do assume bottleneck positions due to variability in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue specificity is driven by context-dependent regulatory paths, providing transcriptional control of tissue-specific processes.
Project description:Transcription is regulated by a multitude of factors that concertedly induce genes to switch between activity states. Eukaryotic transcription involves a multitude of complexes that sequentially assemble on chromatin under the influence of transcription factors and the dynamic state of chromatin. Prokaryotic transcription depends on transcription factors, sigma-factors, and, in some cases, on DNA looping. We present a stochastic model of transcription that considers these complex regulatory mechanisms. We coarse-grain the molecular details in such a way that the model can describe a broad class of gene-regulation mechanisms. We solve this model analytically for various measures of stochastic transcription and compare alternative gene-regulation designs. We find that genes with complex multiprotein regulation can have peaked burst-size distributions in contrast to the geometric distributions found for simple models of transcription regulation. Burst-size distributions are, in addition, shaped by mRNA degradation during transcription bursts. We derive the stochastic properties of genes in the limit of deterministic switch times. These genes typically have reduced transcription noise. Severe timescale separation between gene regulation and transcription initiation enhances noise and leads to bimodal mRNA copy number distributions. In general, complex mechanisms for gene regulation lead to nonexponential waiting-time distributions for gene switching and transcription initiation, which typically reduce noise in mRNA copy numbers and burst size. Finally, we discuss that qualitatively different gene regulation models can often fit the same experimental data on single-cell mRNA abundance even though they have qualitatively different burst-size statistics and regulatory parameters.
Project description:Although genome-wide association studies have identified over 100 risk loci that explain ∼33% of familial risk for prostate cancer (PrCa), their functional effects on risk remain largely unknown. Here we use genotype data from 59,089 men of European and African American ancestries combined with cell-type-specific epigenetic data to build a genomic atlas of single-nucleotide polymorphism (SNP) heritability in PrCa. We find significant differences in heritability between variants in prostate-relevant epigenetic marks defined in normal versus tumour tissue as well as between tissue and cell lines. The majority of SNP heritability lies in regions marked by H3k27 acetylation in prostate adenoc7arcinoma cell line (LNCaP) or by DNaseI hypersensitive sites in cancer cell lines. We find a high degree of similarity between European and African American ancestries suggesting a similar genetic architecture from common variation underlying PrCa risk. Our findings showcase the power of integrating functional annotation with genetic data to understand the genetic basis of PrCa.
Project description:Whole tissue RNASeq is the standard approach for studying gene expression divergence in evolutionary biology and provides a snapshot of the comprehensive transcriptome for a given tissue. However, whole tissues consist of diverse cell types differing in expression profiles, and the cellular composition of these tissues can evolve across species. Here, we investigate the effects of different cellular composition on whole tissue expression profiles. We compared gene expression from whole testes and enriched spermatogenesis populations in two species of house mice, Mus musculus musculus and M. m. domesticus, and their sterile and fertile F1 hybrids, which differ in both cellular composition and regulatory dynamics. We found that cellular composition differences skewed expression profiles and differential gene expression in whole testes samples. Importantly, both approaches were able to detect large-scale patterns such as disrupted X chromosome expression, although whole testes sampling resulted in decreased power to detect differentially expressed genes. We encourage researchers to account for histology in RNASeq and consider methods that reduce sample complexity whenever feasible. Ultimately, we show that differences in cellular composition between tissues can modify expression profiles, potentially altering inferred gene ontological processes, insights into gene network evolution, and processes governing gene expression evolution.
Project description:Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. We previously mapped the genetic regulation of gene expression and mRNA splicing in human microglia, identifying several loci where common genetic variants in microglia-specific regulatory elements explain disease risk loci identified by GWAS. However, identifying genetic effects on splicing has been challenging due to the use of short sequencing reads to identify causal isoforms. Here we present the isoform-centric microglia genomic atlas (isoMiGA) which leverages the power of long-read RNA-seq to identify 35,879 novel microglia isoforms. We show that the novel microglia isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ethnic meta-analysis of 555 human microglia short-read RNA-seq samples from 391 donors, the largest to date, and found associations with genetic risk loci in Alzheimer's disease and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice site usage.
Project description:Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.
Project description:Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.