Dataset Information

Mixed logistic regression in genome-wide association studies.

ABSTRACT:

Background

Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants' effects. We propose two computationally efficient methods to estimate the variants' effects. Their properties and those of other methods (MLM, logistic regression) are evaluated using both simulated and real genomic data from a recent GWAS in two geographically close population in West Africa.

Results

We show that, when the disease prevalence differs between population strata, MLM is inappropriate to analyze binary traits. MLR performs the best in all circumstances. The variants' effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p values inflation or deflation when population strata are not clearly identified in the sample.

Conclusion

The two proposed methods are implemented in the R package milorGWAS available on the CRAN. Both methods scale up to at least 10,000 individuals. The same computational strategies could be applied to other models (e.g. mixed Cox model for survival analysis).

SUBMITTER: Milet J

PROVIDER: S-EPMC7684894 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Mixed logistic regression in genome-wide association studies.

Milet Jacqueline J Courtin David D Garcia André A Perdry Hervé H

BMC bioinformatics 20201123 1

<h4>Background</h4>Mixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved in 2016 that this method is inappropriate in some situations and proposed GMMAT, a score test for the mixed logistic regression (MLR). However, this test does not produces an estimation of the variants' effects. We propose two computationally efficient methods to estimate t ...[more]

PMID: 33228527

Dataset Information

Mixed logistic regression in genome-wide association studies.

Background

Results

Conclusion

Publications

Mixed logistic regression in genome-wide association studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Genome-wide association analysis by lasso penalized logistic regression.
| S-EPMC2732298 | biostudies-literature

Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes.
| S-EPMC4908321 | biostudies-literature

GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies.
| S-EPMC3695771 | biostudies-literature

Analysis of genome-wide association data by large-scale Bayesian logistic regression.
| S-EPMC2795912 | biostudies-literature

A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.
| S-EPMC5520083 | biostudies-literature

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.
| S-EPMC4495769 | biostudies-literature

Robustification of Linear Regression and Its Application in Genome-Wide Association Studies.
| S-EPMC7295010 | biostudies-literature

Federated generalized linear mixed models for collaborative genome-wide association studies.
| S-EPMC10387571 | biostudies-literature

Penalized multivariate linear mixed model for longitudinal genome-wide association studies.
| S-EPMC4143695 | biostudies-literature

Further improvements to linear mixed models for genome-wide association studies.
| S-EPMC4230738 | biostudies-literature