Dataset Information

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

ABSTRACT: A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.

SUBMITTER: Wang H

PROVIDER: S-EPMC5889139 | biostudies-literature | 2017 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Wang Haohan H Aragam Bryon B Xing Eric P EP

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine 20171101

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets ...[more]

PMID: 29629235

Dataset Information

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Publications

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies.
| S-EPMC10319256 | biostudies-literature

Bayesian sparse multiple regression for simultaneous rank reduction and variable selection.
| S-EPMC7584295 | biostudies-literature

VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS.
| S-EPMC4026175 | biostudies-literature

Study of Bayesian variable selection method on mixed linear regression models.
| S-EPMC10022788 | biostudies-literature

Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data.
| S-EPMC9956208 | biostudies-literature

Polygenic modeling with bayesian sparse linear mixed models.
| S-EPMC3567190 | biostudies-literature

Variable Selection in Generalized Functional Linear Models.
| S-EPMC4131701 | biostudies-literature

Bayesian variable selection in linear quantile mixed models for longitudinal data with application to macular degeneration.
| S-EPMC7588124 | biostudies-literature

BGWAS: Bayesian variable selection in linear mixed models with nonlocal priors for genome-wide association studies.
| S-EPMC10176706 | biostudies-literature

A Bayesian variable selection procedure to rank overlapping gene sets.
| S-EPMC3434019 | biostudies-literature