Dataset Information

Comparative assessment and novel strategy on methods for imputing proteomics data.

ABSTRACT: Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy-convex analysis of mixtures-for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.

SUBMITTER: Shen M

PROVIDER: S-EPMC8776850 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparative assessment and novel strategy on methods for imputing proteomics data.

Shen Minjie M Chang Yi-Tan YT Wu Chiung-Ting CT Parker Sarah J SJ Saylor Georgia G Wang Yizhi Y Yu Guoqiang G Van Eyk Jennifer E JE Clarke Robert R Herrington David M DM Wang Yue Y

Scientific reports 20220120 1

Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight ...[more]

PMID: 35058491

Dataset Information

Comparative assessment and novel strategy on methods for imputing proteomics data.

Publications

Comparative assessment and novel strategy on methods for imputing proteomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.
| S-EPMC6788053 | biostudies-literature

Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data.
| S-EPMC4748267 | biostudies-literature

Improving Proteomics Data Reproducibility with a Dual-Search Strategy.
| S-EPMC7896416 | biostudies-literature

Comparative Analysis of Quantitative Mass Spectrometric Methods for Subcellular Proteomics.
| S-EPMC8063867 | biostudies-literature

Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis.
| S-EPMC7216093 | biostudies-literature

A pairwise strategy for imputing predictive features when combining multiple datasets.
| S-EPMC9835467 | biostudies-literature

Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data.
| S-EPMC4511015 | biostudies-literature

Novel multivariate methods for integration of genomics and proteomics data: applications in a kidney transplant rejection study.
| S-EPMC4229708 | biostudies-literature

Cardiopulmonary Exercise Test Data Averaging Methods and Preoperative Risk Assessment
| 2720708 | ecrin-mdr-crc

Gene Expression Profiling of Whole Blood: A Comparative Assessment of RNA-Stabilizing Collection Methods
2019-09-20 | GSE103889 | GEO