Dataset Information

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

ABSTRACT: Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the "exhaustive" extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50-100 counts) and the mass tolerance (0.005-0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative.

SUBMITTER: Tugizimana F

PROVIDER: S-EPMC5192446 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

Tugizimana Fidele F Steenkamp Paul A PA Piater Lizelle A LA Dubery Ian A IA

Metabolites 20161103 4

Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the "exhaustive" extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC- ...[more]

PMID: 27827887

Dataset Information

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

Publications

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Filtering procedures for untargeted LC-MS metabolomics data.
| S-EPMC6570933 | biostudies-literature

Network Marker Selection for Untargeted LC-MS Metabolomics Data.
| S-EPMC5441461 | biostudies-literature

Deep annotation of untargeted LC-MS metabolomics data with Binner.
| S-EPMC7828469 | biostudies-literature

Peak Annotation and Verification Engine for Untargeted LC-MS Metabolomics.
| S-EPMC6501219 | biostudies-literature

MARS: A Multipurpose Software for Untargeted LC-MS-Based Metabolomics and Exposomics.
| S-EPMC10831794 | biostudies-literature

Untargeted LC-MS/MS analysis reveals metabolomics feature of osteosarcoma stem cell response to methotrexate.
| S-EPMC7313215 | biostudies-literature

MetaDB a Data Processing Workflow in Untargeted MS-Based Metabolomics Experiments.
| S-EPMC4267269 | biostudies-literature

Untargeted and Targeted LC-MS/MS Based Metabolomics Study on In Vitro Culture of Phaeoacremonium Species.
| S-EPMC8780456 | biostudies-literature

Untargeted LC-MS/MS Metabolomics Study of HO-AAVPA and VPA on Breast Cancer Cell Lines.
| S-EPMC10572250 | biostudies-literature

Automated Annotation of Untargeted All-Ion Fragmentation LC-MS Metabolomics Data with MetaboAnnotatoR.
| S-EPMC8892435 | biostudies-literature