Dataset Information

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

ABSTRACT: Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comprehensive penalized regression for fitting l ₁ regularized regression models to GWAS summary statistics. We propose incorporating Pleiotropy and ANnotation information into PRS (PANPRS) development through suitable formulation of penalty functions and associated tuning parameters. Extensive simulations show that PANPRS performs equally well or better than existing PRS methods when no functional annotation or pleiotropy is incorporated. When functional annotation data and pleiotropy are informative, PANPRS substantially outperforms existing PRS methods in simulations. Finally, we applied our methods to build PRS for type 2 diabetes and melanoma and found that incorporating relevant functional annotations and GWAS of genetically related traits improved prediction of these two complex diseases.

SUBMITTER: Chen TH

PROVIDER: S-EPMC8414872 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

Chen Ting-Huei TH Chatterjee Nilanjan N Landi Maria Teresa MT Shi Jianxin J

Journal of the American Statistical Association 20201012 533

Large-scale genome-wide association (GWAS) studies provide opportunities for developing genetic risk prediction models that have the potential to improve disease prevention, intervention or treatment. The key step is to develop polygenic risk score (PRS) models with high predictive performance for a given disease, which typically requires a large training data set for selecting truly associated single nucleotide polymorphisms (SNPs) and estimating effect sizes accurately. Here, we develop a comp ...[more]

PMID: 34483403

Dataset Information

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

Publications

A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Penalized regression and model selection methods for polygenic scores on summary statistics.
| S-EPMC7553329 | biostudies-literature

Multivariate extension of penalized regression on summary statistics to construct polygenic risk scores for correlated traits.
| S-EPMC10276147 | biostudies-literature

Improved polygenic prediction by Bayesian multiple regression on summary statistics.
| S-EPMC6841727 | biostudies-literature

Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.
| S-EPMC4862742 | biostudies-literature

Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics.
| S-EPMC5105870 | biostudies-literature

Overestimated prediction using polygenic prediction derived from summary statistics.
| S-EPMC10500750 | biostudies-literature

A synthetic data integration framework to leverage external summary-level information from heterogeneous populations.
| S-EPMC10480346 | biostudies-literature

PUMAS: fine-tuning polygenic risk scores with GWAS summary statistics.
| S-EPMC8419981 | biostudies-literature

Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.
| S-EPMC11462675 | biostudies-literature

Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics.
| S-EPMC7332650 | biostudies-literature