Unknown

Dataset Information

0

Making the Most of Clumping and Thresholding for Polygenic Scores.


ABSTRACT: Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

SUBMITTER: Prive F 

PROVIDER: S-EPMC6904799 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Making the Most of Clumping and Thresholding for Polygenic Scores.

Privé Florian F   Vilhjálmsson Bjarni J BJ   Aschard Hugues H   Blum Michael G B MGB  

American journal of human genetics 20191121 6


Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For  ...[more]

Similar Datasets

| S-EPMC8759285 | biostudies-literature
| S-EPMC7292502 | biostudies-literature
| S-EPMC7431089 | biostudies-literature
| S-EPMC5758043 | biostudies-literature
| S-EPMC4341990 | biostudies-literature
| S-EPMC7642950 | biostudies-literature
| S-EPMC8445431 | biostudies-literature
| S-EPMC8093180 | biostudies-literature
| S-EPMC7396092 | biostudies-literature
| S-EPMC5373783 | biostudies-literature