Project description:The genomic distribution of trait-associated SNPs (TASs) discovered in genome-wide association studies (GWAS) can provide insight into the genetic architecture of complex traits and the design of future studies. Here we report on a maize GWAS that identified TASs underlying five quantitative traits measured across a large panel of samples and examine the characteristics of these TASs. A set of SNPs obtained via RNA sequencing (RNA-seq), most of which are located within annotated genes (~87%) were complemented with additional SNPs from the maize HapMap Project that contains approximately equal proportions of intragenic and intergenic SNPs. TASs were identified via a genome scan while controlling for polygenic background effects. The diverse functions of TAS-containing candidate genes indicate that complex genetic networks shape these traits. The vast majority of the TAS-containing candidate genes have dynamic expression levels among developmental stages. Overall, TASs explain 44~54% of the total phenotypic variation for these traits, with equal contributions from intra- and inter-genic TASs. Association of ligueless2 with upper leaf angle was implicated by two intragenic TASs; rough sheath1 was associated with leaf width by an upstream intergenic TAS; and Zea agamous5 was associated with days to silking by both intra- and inter-genic TASs. A large proportion (82%) of these TASs comes from noncoding regions, similar to findings from human diseases and traits. However, TASs were enriched in both intergenic (53%) and promoter 5kb (24%) regions, but under-represented in a set of nonsynonymous SNPs.
Project description:The human microbiome is a complex ecological system, and describing its structure and function under different environmental conditions is important from both basic scientific and medical perspectives. Viewed through a biostatistical lens, many microbiome analysis goals can be formulated as latent variable modeling problems. However, although probabilistic latent variable models are a cornerstone of modern unsupervised learning, they are rarely applied in the context of microbiome data analysis, in spite of the evolutionary, temporal, and count structure that could be directly incorporated through such models. We explore the application of probabilistic latent variable models to microbiome data, with a focus on Latent Dirichlet allocation, Non-negative matrix factorization, and Dynamic Unigram models. To develop guidelines for when different methods are appropriate, we perform a simulation study. We further illustrate and compare these techniques using the data of Dethlefsen and Relman (2011, Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proceedings of the National Academy of Sciences108, 4554-4561), a study on the effects of antibiotics on bacterial community composition. Code and data for all simulations and case studies are available publicly.
Project description:We investigated whether intersecting functional genomic data (ATAC-seq + promoter focused Capture C) with increasingly powered publically available GWAS for Body Mass Index and Waist to Hip Ratio could identify additional true postiive subsignficant signals (5x10-8< P value < 5x10-4) without increasing the GWAS sample size
Project description:We investigated whether intersecting functional genomic data (ATAC-seq + promoter focused Capture C) with increasingly powered publically available GWAS for Body Mass Index and Waist to Hip Ratio could identify additional true postiive subsignficant signals (5x10-8< P value < 5x10-4) without increasing the GWAS sample size
Project description:In this work we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRN) with GWAS. We hypothesized that early regulators in T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time series expression and DNA methylation profiling of T-cell differentiation identified several upstream TFs, of which TFs involved in Th1/2 differentiation were most enriched for disease associated SNPs identified by GWAS.
Project description:In this work we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRN) with GWAS. We hypothesized that early regulators in T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time series expression and DNA methylation profiling of T-cell differentiation identified several upstream TFs, of which TFs involved in Th1/2 differentiation were most enriched for disease associated SNPs identified by GWAS.
Project description:In this work we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRN) with GWAS. We hypothesized that early regulators in T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time series expression and DNA methylation profiling of T-cell differentiation identified several upstream TFs, of which TFs involved in Th1/2 differentiation were most enriched for disease associated SNPs identified by GWAS.
Project description:In this work we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRN) with GWAS. We hypothesized that early regulators in T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time series expression and DNA methylation profiling of T-cell differentiation identified several upstream TFs, of which TFs involved in Th1/2 differentiation were most enriched for disease associated SNPs identified by GWAS.