Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.
Ontology highlight
ABSTRACT: Peak detection is a key step in the preprocessing of untargeted metabolomics data generated from high-resolution liquid chromatography-mass spectrometry (LC/MS). The common practice is to use filters with predetermined parameters to select peaks in the LC/MS profile. This rigid approach can cause suboptimal performance when the choice of peak model and parameters do not suit the data characteristics.Here we present a method that learns directly from various data features of the extracted ion chromatograms (EICs) to differentiate between true peak regions from noise regions in the LC/MS profile. It utilizes the knowledge of known metabolites, as well as robust machine learning approaches. Unlike currently available methods, this new approach does not assume a parametric peak shape model and allows maximum flexibility. We demonstrate the superiority of the new approach using real data. Because matching to known metabolites entails uncertainties and cannot be considered a gold standard, we also developed a probabilistic receiver-operating characteristic (pROC) approach that can incorporate uncertainties.The new peak detection approach is implemented as part of the apLCMS package available at http://web1.sph.emory.edu/apLCMS/ CONTACT: tyu8@emory.eduSupplementary data are available at Bioinformatics online.
SUBMITTER: Yu T
PROVIDER: S-EPMC4184266 | biostudies-literature | 2014 Oct
REPOSITORIES: biostudies-literature
ACCESS DATA