Dataset Information

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

ABSTRACT: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.

SUBMITTER: Chu BB

PROVIDER: S-EPMC7268817 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Chu Benjamin B BB Keys Kevin L KL German Christopher A CA Zhou Hua H Zhou Jin J JJ Sobel Eric M EM Sinsheimer Janet S JS Lange Kenneth K

GigaScience 20200601 6

<h4>Background</h4>Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.<h4>Results</h4>We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate ge ...[more]

PMID: 32491161

Dataset Information

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Publications

Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multivariate genome-wide association analysis by iterative hard thresholding.
| S-EPMC10133532 | biostudies-literature

Iterative hard thresholding for model selection in genome-wide association studies.
| S-EPMC5696071 | biostudies-literature

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response.
| S-EPMC4597416 | biostudies-literature

Catalytic prior distributions with application to generalized linear models.
| S-EPMC7275732 | biostudies-literature

Weight Smoothing for Generalized Linear Models Using a Laplace Prior.
| S-EPMC5719898 | biostudies-literature

Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.
| S-EPMC4874534 | biostudies-literature

A hierarchical prior for generalized linear models based on predictions for the mean response.
| S-EPMC9566454 | biostudies-literature

Federated generalized linear mixed models for collaborative genome-wide association studies.
| S-EPMC10387571 | biostudies-literature

An Information Matrix Prior for Bayesian Analysis in Generalized Linear Models with High Dimensional Data.
| S-EPMC2909687 | biostudies-literature

GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies.
| S-EPMC6298063 | biostudies-literature