Dataset Information

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

ABSTRACT:

Introduction

Despite the availability of several pre-processing software, poor peak integration remains a prevalent problem in untargeted metabolomics data generated using liquid chromatography high-resolution mass spectrometry (LC-MS). As a result, the output of these pre-processing software may retain incorrectly calculated metabolite abundances that can perpetuate in downstream analyses.

Objectives

To address this problem, we propose a computational methodology that combines machine learning and peak quality metrics to filter out low quality peaks.

Methods

Specifically, we comprehensively and systematically compared the performance of 24 different classifiers generated by combining eight classification algorithms and three sets of peak quality metrics on the task of distinguishing reliably integrated peaks from poorly integrated ones. These classifiers were compared to using a residual standard deviation (RSD) cut-off in pooled quality-control (QC) samples, which aims to remove peaks with analytical error.

Results

The best performing classifier was found to be a combination of the AdaBoost algorithm and a set of 11 peak quality metrics previously explored in untargeted metabolomics and proteomics studies. As a complementary approach, applying our framework to peaks retained after filtering by 30% RSD across pooled QC samples was able to further distinguish poorly integrated peaks that were not removed from filtering alone. An R implementation of these classifiers and the overall computational approach is available as the MetaClean package at https://CRAN.R-project.org/package=MetaClean .

Conclusion

Our work represents an important step forward in developing an automated tool for filtering out unreliable peak integrations in untargeted LC-MS metabolomics data.

SUBMITTER: Chetnik K

PROVIDER: S-EPMC7895495 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

Chetnik Kelsey K Petrick Lauren L Pandey Gaurav G

Metabolomics : Official journal of the Metabolomic Society 20201021 11

<h4>Introduction</h4>Despite the availability of several pre-processing software, poor peak integration remains a prevalent problem in untargeted metabolomics data generated using liquid chromatography high-resolution mass spectrometry (LC-MS). As a result, the output of these pre-processing software may retain incorrectly calculated metabolite abundances that can perpetuate in downstream analyses.<h4>Objectives</h4>To address this problem, we propose a computational methodology that combines ma ...[more]

PMID: 33085002

Dataset Information

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

Introduction

Objectives

Methods

Results

Conclusion

Publications

MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Peak Annotation and Verification Engine for Untargeted LC-MS Metabolomics.
| S-EPMC6501219 | biostudies-literature

Mass Spectral Feature List Optimizer (MS-FLO): A Tool To Minimize False Positive Peak Reports in Untargeted Liquid Chromatography-Mass Spectroscopy (LC-MS) Data Processing.
| S-EPMC7334838 | biostudies-literature

Comprehensive Peak Characterization (CPC) in Untargeted LC-MS Analysis.
| S-EPMC8878835 | biostudies-literature

Filtering procedures for untargeted LC-MS metabolomics data.
| S-EPMC6570933 | biostudies-literature

Network Marker Selection for Untargeted LC-MS Metabolomics Data.
| S-EPMC5441461 | biostudies-literature

Deep annotation of untargeted LC-MS metabolomics data with Binner.
| S-EPMC7828469 | biostudies-literature

Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.
| S-EPMC4184266 | biostudies-literature

Deep Learning-Assisted Peak Curation for Large-Scale LC-MS Metabolomics.
| S-EPMC8969107 | biostudies-literature

MARS: A Multipurpose Software for Untargeted LC-MS-Based Metabolomics and Exposomics.
| S-EPMC10831794 | biostudies-literature

Untargeted LC-MS/MS analysis reveals metabolomics feature of osteosarcoma stem cell response to methotrexate.
| S-EPMC7313215 | biostudies-literature