Dataset Information

A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation.

ABSTRACT: One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missingness and classes, and a test for difference of observed intensities between classes. This approach implicitly handles both missingness mechanisms. We performed a quantitative and qualitative comparison of our procedure with imputation-based feature selection methods on two experimental data sets, as well as simulated data with various scenarios regarding the missingness mechanisms and the nature of the difference of expression (differential intensity or differential presence). Whereas we observed similar performances in terms of prediction on the experimental data set, the feature ranking and selection from various imputation-based methods were strongly divergent. We showed that the combined test reaches a compromise by correlating reasonably with other methods, and remains efficient in all simulated scenarios unlike imputation-based feature selection methods.

SUBMITTER: Plancade S

PROVIDER: S-EPMC9235818 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation.

Plancade Sandra S Berland Magali M Blein-Nicolas Mélisande M Langella Olivier O Bassignani Ariane A Juste Catherine C

PeerJ 20220624

One of the difficulties encountered in the statistical analysis of metaproteomics data is the high proportion of missing values, which are usually treated by imputation. Nevertheless, imputation methods are based on restrictive assumptions regarding missingness mechanisms, namely "at random" or "not at random". To circumvent these limitations in the context of feature selection in a multi-class comparison, we propose a univariate selection method that combines a test of association between missi ...[more]

PMID: 35769140

Dataset Information

A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation.

Publications

A combined test for feature selection on sparse metaproteomics data-an alternative to missing value imputation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Missing value imputation for epistatic MAPs.
| S-EPMC2873538 | biostudies-literature

Optimal Sparse Linear Prediction for Block-missing Multi-modality Data without Imputation.
| S-EPMC8612700 | biostudies-literature

The importance of batch sensitization in missing value imputation.
| S-EPMC9944322 | biostudies-literature

A Simultaneous Feature Selection and Compositional Association Test for Detecting Sparse Associations in High-Dimensional Metagenomic Data.
| S-EPMC8978828 | biostudies-literature

Missing value imputation using least squares techniques in contaminated matrices.
| S-EPMC9036115 | biostudies-literature

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.
| S-EPMC5766532 | biostudies-literature

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics.
| S-EPMC10593076 | biostudies-literature

Missing value imputation in proximity extension assay-based targeted proteomics data.
| S-EPMC7735586 | biostudies-literature

Variable selection in the presence of missing data: resampling and imputation.
| S-EPMC5156376 | biostudies-literature

Assessing Alternative Imputation Strategies for Infrequently Missing Items on Multi-item Scales.
| S-EPMC9718541 | biostudies-literature