Project description:Genome-wide association studies have identified loci underlying human diseases, but the causal nucleotide changes and mechanisms remain largely unknown. Here we developed a fine-mapping algorithm to identify candidate causal variants for 21 autoimmune diseases from genotyping data. We integrated these predictions with transcription and cis-regulatory element annotations, derived by mapping RNA and chromatin in primary immune cells, including resting and stimulated CD4(+) T-cell subsets, regulatory T cells, CD8(+) T cells, B cells, and monocytes. We find that ∼90% of causal variants are non-coding, with ∼60% mapping to immune-cell enhancers, many of which gain histone acetylation and transcribe enhancer-associated RNA upon immune stimulation. Causal variants tend to occur near binding sites for master regulators of immune differentiation and stimulus-dependent gene activation, but only 10-20% directly alter recognizable transcription factor binding motifs. Rather, most non-coding risk variants, including those that alter gene expression, affect non-canonical sequence determinants not well-explained by current gene regulatory models.
Project description:Fine-mapping refines genotype-phenotype association signals to identify causal variants underlying complex traits. However, current methods typically focus on individual genomic segments without considering the global genetic architecture. Here, we demonstrate the advantages of performing genome-wide fine-mapping (GWFM) and develop methods to facilitate GWFM. In simulations and real data analyses, GWFM outperforms current methods in error control, mapping power and precision, replication rate, and trans-ancestry phenotype prediction. For 48 well-powered traits in the UK Biobank, we identify causal variants that collectively explain 17% of the SNP-based heritability, and predict that fine-mapping 50% of that would require 2 million samples on average. We pinpoint a known causal variant, as proof-of-principle, at FTO for body mass index, unveil a hidden secondary variant with evolutionary conservation, and identify new missense causal variants for schizophrenia and Crohn's disease. Overall, we analyse 599 complex traits with 13 million SNPs, highlighting the efficacy of GWFM with functional annotations.
Project description:Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.
Project description:Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
Project description:To date, genome-wide association studies (GWASs) have discovered 35 susceptible loci of leprosy; however, the cumulative effects of these loci can only partially explain the overall risk of leprosy, and the causal variants and genes within these loci remain unknown. Here, we conducted out new GWASs in two independent cohorts of 5007 cases and 4579 controls and then a meta-analysis in these newly generated and multiple previously published (2277 cases and 3159 controls) datasets were performed. Three novel and 15 previously reported risk loci were identified from these datasets, increasing the known leprosy risk loci of explained genetic heritability from 23.0 to 38.5%. A comprehensive fine-mapping analysis was conducted, and 19 causal variants and 14 causal genes were identified. Specifically, manual checking of epigenomic information from the Epimap database revealed that the causal variants were mainly located within the immune-relevant or immune-specific regulatory elements. Furthermore, by using gene-set, tissue, and cell-type enrichment analyses, we highlighted the key roles of immune-related tissues and cells and implicated the PD-1 signaling pathways in the pathogenetic mechanism of leprosy. Collectively, our study identified candidate causal variants and elucidated the potential regulatory and coding mechanisms for genes associated with leprosy.
Project description:Recent genome-wide association studies have identified 78 loci associated with Parkinson's disease susceptibility but the underlying mechanisms remain largely unclear. To identify likely causal variants for disease risk, we fine-mapped these Parkinson's-associated loci using four different fine-mapping methods. We then integrated multi-assay cell type-specific epigenomic profiles to pinpoint the likely mechanism of action of each variant, allowing us to identify Consensus single nucleotide polymorphism (SNPs) that disrupt LRRK2 and FCGR2A regulatory elements in microglia, an MBNL2 enhancer in oligodendrocytes, and a DYRK1A enhancer in neurons. This genome-wide functional fine-mapping investigation of Parkinson's disease substantially advances our understanding of the causal mechanisms underlying this complex disease while avoiding focus on spurious, non-causal mechanisms. Together, these results provide a robust, comprehensive list of the likely causal variants, genes and cell-types underlying Parkinson's disease risk as demonstrated by consistently greater enrichment of our fine-mapped SNPs relative to lead GWAS SNPs across independent functional impact annotations. In addition, our approach prioritized an average of 3/85 variants per locus as putatively causal, making downstream experimental studies both more tractable and more likely to yield disease-relevant, actionable results. Large-scale studies comparing individuals with Parkinson's disease to age-matched controls have identified many regions of the genome associated with the disease. However, there is widespread correlation between different parts of the genome, making it difficult to tell which genetic variants cause Parkinson's and which are simply co-inherited with causal variants. We therefore applied a suite of statistical models to identify the most likely causal genetic variants (i.e. fine-mapping). We then linked these genetic variants with epigenomic and gene expression signatures across a wide variety of tissues and cell types to identify how these variants cause disease. Therefore, this study provides a comprehensive and robust list of cellular and molecular mechanisms that may serve as targets in the development of more effective Parkinson's therapeutics.
Project description:Migraine is a highly prevalent neurovascular disorder for which genome-wide association studies (GWAS) have identified over one hundred risk loci, yet the causal variants and genes remain mostly unknown. Here, we meta-analyzed three migraine GWAS including 98,374 cases and 869,160 controls and identified 122 independent risk loci of which 35 were new. Fine-mapping of a meta-analysis is challenging because some variants may be missing from some participating studies and accurate linkage disequilibrium (LD) information of the variants is often not available. Here, using the exact in-sample LD, we first investigated which statistics could reliably capture the quality of fine-mapping when only reference LD was available. We observed that the posterior expected number of causal variants best distinguished between the high- and low-quality results. Next, we performed fine-mapping for 102 autosomal risk regions using FINEMAP. We produced high-quality fine-mapping for 93 regions and defined 181 distinct credible sets. Among the high-quality credible sets were 7 variants with very high posterior inclusion probability (PIP > 0.9) and 2 missense variants with PIP > 0.5 (rs6330 in NGF and rs1133400 in INPP5A). For 35 association signals, we managed to narrow down the set of potential risk variants to at most 5 variants.
Project description:Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.
Project description:Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
Project description:BackgroundObesity is highly influenced by heritability and variant effects. While previous genome-wide association studies (GWASs) have successfully identified numerous genetic loci associated with obesity-related traits [body mass index (BMI) and waist-to-hip ratio (WHR)], most causal variants remain unidentified. The high degree of linkage disequilibrium (LD) throughout the genome makes it extremely difficult to distinguish the GWAS-associated SNPs that exert a true biological effect.ObjectiveThis study was to identify the potential causal variants having a biological effect on obesity-related traits.MethodsWe used Probabilistic Annotation INTegratOR, a Bayesian fine-mapping method, which incorporated genetic association data (GWAS summary statistics), LD structure, and functional annotations to calculate a posterior probability of causality for SNPs across all loci of interest. Moreover, we performed gene expression analysis using the available public transcriptomic data to validate the corresponding genes of the potential causal SNPs partially.ResultsWe identified 96 SNPs for BMI and 43 SNPs for WHR with a high posterior probability of causality (> 99%), including 49 BMI SNPs and 24 WHR SNPs which did not reach genome-wide significance in the original GWAS. Finally, we partially validated some genes corresponding to the potential causal SNPs.ConclusionUsing a statistical fine-mapping approach, we identified a set of potential causal variants to be prioritized for future functional validation and also detected some novel trait-associated variants. These results provided novel insight into our understanding of the genetics of obesity and also demonstrated that fine mapping may improve upon the results identified by the original GWASs.