Dataset Information

Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data.

ABSTRACT:

Motivation

Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection.

Results

To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression.

Availability and implementation

Software is available at https://github.com/HaohanWang/thePrecisionLasso.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Wang H

PROVIDER: S-EPMC6449749 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data.

Wang Haohan H Lengerich Benjamin J BJ Aragam Bryon B Xing Eric P EP

Bioinformatics (Oxford, England) 20190401 7

<h4>Motivation</h4>Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of cor ...[more]

PMID: 30184048

Dataset Information

Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data.

Motivation

Results

Availability and implementation

Supplementary information

Publications

Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data.
| S-EPMC7523642 | biostudies-literature

A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data
| S-EPMC10953151 | biostudies-literature

The joint lasso: high-dimensional regression for group structured data.
| S-EPMC7868060 | biostudies-literature

Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data.
| S-EPMC6612810 | biostudies-literature

Extended graphical lasso for multiple interaction networks for high dimensional omics data.
| S-EPMC8528283 | biostudies-literature

Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data.
| S-EPMC6845853 | biostudies-literature

High-dimensional genomic data bias correction and data integration using MANCIE.
| S-EPMC4833864 | biostudies-other

DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.
| S-EPMC9365063 | biostudies-literature

Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
| S-EPMC6869070 | biostudies-literature

Characterizing non-linear dependencies among pairs of clinical variables and imaging data.
| S-EPMC3561932 | biostudies-literature