Unknown

Dataset Information

0

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.


ABSTRACT: BACKGROUND:Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation. METHODS:We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n?=?1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci. RESULTS:Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable. CONCLUSION:Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.

SUBMITTER: Do KT 

PROVIDER: S-EPMC6153696 | biostudies-literature | 2018 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.

Do Kieu Trinh KT   Wahl Simone S   Raffler Johannes J   Molnos Sophie S   Laimighofer Michael M   Adamski Jerzy J   Suhre Karsten K   Strauch Konstantin K   Peters Annette A   Gieger Christian C   Langenberg Claudia C   Stewart Isobel D ID   Theis Fabian J FJ   Grallert Harald H   Kastenmüller Gabi G   Krumsiek Jan J  

Metabolomics : Official journal of the Metabolomic Society 20180920 10


<h4>Background</h4>Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample prepar  ...[more]

Similar Datasets

| S-EPMC9109373 | biostudies-literature
| S-EPMC6570933 | biostudies-literature
| S-EPMC5441461 | biostudies-literature
| S-EPMC5072275 | biostudies-literature
| S-EPMC5192446 | biostudies-literature
| S-EPMC7828469 | biostudies-literature
| S-EPMC9493301 | biostudies-literature
| S-EPMC4101515 | biostudies-literature
| S-EPMC4267269 | biostudies-literature
| S-EPMC9209422 | biostudies-literature