Dataset Information

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

ABSTRACT:

Background

Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.

Results

A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.

Conclusions

sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

SUBMITTER: Le Cao KA

PROVIDER: S-EPMC3133555 | biostudies-literature | 2011 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

Lê Cao Kim-Anh KA Boitard Simon S Besse Philippe P

BMC bioinformatics 20110622

<h4>Background</h4>Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can b ...[more]

PMID: 21693065

Similar Datasets

Project description:BackgroundData-visualization methods are essential to explore and communicate meta-analytic data and results. With a large number of novel graphs proposed quite recently, a comprehensive, up-to-date overview of available graphing options for meta-analysis is unavailable.MethodsWe applied a multi-tiered search strategy to find the meta-analytic graphs proposed and introduced so far. We checked more than 150 retrievable textbooks on research synthesis methodology cover to cover, six different software programs regularly used for meta-analysis, and the entire content of two leading journals on research synthesis. In addition, we conducted Google Scholar and Google image searches and cited-reference searches of prior reviews of the topic. Retrieved graphs were categorized into a taxonomy encompassing 11 main classes, evaluated according to 24 graph-functionality features, and individually presented and described with explanatory vignettes.ResultsWe ascertained more than 200 different graphs and graph variants used to visualize meta-analytic data. One half of these have accrued within the past 10 years alone. The most prevalent classes were graphs for network meta-analysis (45 displays), graphs showing combined effect(s) only (26), funnel plot-like displays (24), displays showing more than one outcome per study (19), robustness, outlier and influence diagnostics (15), study selection and p-value based displays (15), and forest plot-like displays (14). The majority of graphs (130, 62.5%) possessed a unique combination of graph features.ConclusionsThe rich and diverse set of available meta-analytic graphs offers a variety of options to display many different aspects of meta-analyses. This comprehensive overview of available graphs allows researchers to make better-informed decisions on which graphs suit their needs and therefore facilitates using the meta-analytic tool kit of graphs to its full potential. It also constitutes a roadmap for a goal-driven development of further graphical displays for research synthesis.

Dataset Information

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

Background

Results

Conclusions

Publications

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets