Unknown

Dataset Information

0

In silico identification of putative causal genetic variants.


ABSTRACT: Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Despite the widespread availability of genome-wide data, existing methods to analyze genetic data still primarily focus on marginal association models, which fall short of fully capturing the polygenic nature of complex traits and elucidating biological causal mechanisms. Here we present a computationally efficient causal inference framework for genome-wide detection of putative causal variants underlying genetic associations. Our approach utilizes summary statistics from potentially overlapping studies as input, constructs in silico knockoff copies of summary statistics as negative controls to attenuate confounding effects induced by linkage disequilibrium, and employs efficient ultrahigh-dimensional sparse regression to jointly model all genetic variants across the genome. Our method is computationally efficient, requiring less than 15 minutes on a single CPU to analyze genome-wide summary statistics. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD) we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline via marginal association testing. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of large-scale genome-wide association studies (GWAS) summary statistics from 2013 to 2022. Results reveal the method's capacity to robustly discover additional loci for polygenic traits beyond conventional GWAS and pinpoint potential causal variants underpinning each locus (on average, 22.7% more loci and 78.7% fewer proxy variants), contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses. We are making the discoveries and software freely available to the community and anticipate that routine end-to-end in silico identification of putative causal genetic variants will become an important tool that will facilitate downstream functional experiments and future research into disease etiology, as well as the exploration of novel therapeutic avenues.

SUBMITTER: He Z 

PROVIDER: S-EPMC10925326 | biostudies-literature | 2024 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics.

He Zihuai Z   Chu Benjamin B   Yang James J   Gu Jiaqi J   Chen Zhaomeng Z   Liu Linxi L   Morrison Tim T   Belloy Michael E ME   Qi Xinran X   Hejazi Nima N   Mathur Maya M   Le Guen Yann Y   Tang Hua H   Hastie Trevor T   Ionita-Laza Iuliana I   Candès Emmanuel E   Sabatti Chiara C  

bioRxiv : the preprint server for biology 20250226


Understanding the causal genetic architecture of complex phenotypes will fuel future research into disease mechanisms and potential therapies. Here, we illustrate the power of a novel framework: it detects, starting from summary statistics, and across the entire genome, sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. The approach, implemented in open-source software, is also computationally efficient, requi  ...[more]

Similar Datasets

| S-EPMC10586424 | biostudies-literature
| S-EPMC3902491 | biostudies-literature
| S-EPMC2801753 | biostudies-literature
2021-05-24 | GSE174534 | GEO
| S-EPMC6587547 | biostudies-literature
| S-EPMC7604888 | biostudies-literature
| S-EPMC5762310 | biostudies-literature
| S-EPMC9606389 | biostudies-literature
| S-EPMC6124172 | biostudies-literature
| S-EPMC4143216 | biostudies-literature