Dataset Information

Sparse support vector machines with L0 approximation for ultra-high dimensional omics data.

ABSTRACT: Omics data usually have ultra-high dimension (p) and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computational complexity increases with the number of features. In addition, L1 norm is known to be asymptotically biased and not consistent for feature selection. In this paper, we develop an efficient method for sparse support vector machines with L0 norm approximation. The proposed method approximates the L0 minimization through solving a series of L2 optimization problems, which can be formulated with dual variables. It finds the optimal solution for p primal variables through estimating n dual variables, which is more efficient as long as the sample size is small. L0 approximation leads to sparsity in both dual and primal variables, and can be used for both feature and sample selections. The proposed method identifies much less number of features and achieves similar performances in simulations. We apply the proposed method to feature selections with metagenomic sequencing and gene expression data. It can identify biologically important genes and taxa efficiently.

SUBMITTER: Liu Z

PROVIDER: S-EPMC6553498 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sparse support vector machines with L0 approximation for ultra-high dimensional omics data.

Liu Zhenqiu Z Elashoff David D Piantadosi Steven S

Artificial intelligence in medicine 20190430

Omics data usually have ultra-high dimension (p) and small sample size (n). Standard support vector machines (SVMs), which minimize the L2 norm for the primal variables, only lead to sparse solutions for the dual variables. L1 based SVMs, directly minimizing the L1 norm, have been used for feature selection with omics data. However, most current methods directly solve the primal formulations of the problem, which are not computationally scalable. The computationa ...[more]

PMID: 31164207

Dataset Information

Sparse support vector machines with L0 approximation for ultra-high dimensional omics data.

Publications

Sparse support vector machines with L<sub>0</sub> approximation for ultra-high dimensional omics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Functional robust support vector machines for sparse and irregular longitudinal data.
| S-EPMC3668975 | biostudies-literature

Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data.
| S-EPMC2854644 | biostudies-literature

A surrogate ℓ<sub>0</sub> sparse Cox's regression with applications to sparse high-dimensional massive sample size time-to-event data.
| S-EPMC8386178 | biostudies-literature

Reinforced Angle-based Multicategory Support Vector Machines.
| S-EPMC5120762 | biostudies-literature

Explaining Support Vector Machines: A Color Based Nomogram.
| S-EPMC5056733 | biostudies-literature

Transmembrane protein topology prediction using support vector machines.
| S-EPMC2700806 | biostudies-literature

miRBoost: boosting support vector machines for microRNA precursor classification.
| S-EPMC4408786 | biostudies-literature

3D ultrasound image segmentation using wavelet support vector machines.
| S-EPMC3360689 | biostudies-other

Multiset sparse redundancy analysis for high-dimensional omics data.
| S-EPMC6587877 | biostudies-literature

The Sparse MLE for Ultra-High-Dimensional Feature Screening.
| S-EPMC4219371 | biostudies-literature