Metabolomics

Dataset Information

0

Employing fingerprinting of medicinal plants by means of LC-MS and machine learning for species identification task


ABSTRACT:

A dataset of liquid chromatography-mass spectrometry measurements of medicinal plant extracts from 76 species was generated and used for training and validating plant species identification algorithms. Various strategies for data handling and feature space selection were tested. Constrained Tucker decomposition, large-scale (more than 1500 variables) discrete Bayesian Networks and autoencoder based dimensionality reduction coupled with continuous Bayes classifier and logistic regression were optimized to achieve the best accuracy. Classification algorithms based on Tucker decomposition of original data and logistic regression on representation learned with autoencoder showed identification accuracy of up to 96%, outperforming various implementations of Bayesian Networks. Benefits and drawbacks of used approaches were discussed. Tolerance to changes in data created by using different extraction methods and equipment was tentatively tested.


Main study is reported in the current study MTBLS688

Helianthus tuberosus assay is reported in MTBLS759

INSTRUMENT(S): Liquid Chromatography MS - Alternating (LC-MS (Alternating))

SUBMITTER: Dmitry Nazarenko 

PROVIDER: MTBLS688 | MetaboLights | 2022-04-15

REPOSITORIES: MetaboLights

Dataset's files

Source:
Action DRS
MTBLS688 Other
FILES Other
a_MTBLS688_NEG_metabolite_profiling_mass_spectrometry.txt Txt
a_MTBLS688_POS_metabolite_profiling_mass_spectrometry.txt Txt
files-all.json Other
Items per page:
1 - 5 of 7
altmetric image

Publications

Employing fingerprinting of medicinal plants by means of LC-MS and machine learning for species identification task.

Kharyuk Pavel P   Nazarenko Dmitry D   Oseledets Ivan I   Rodin Igor I   Shpigun Oleg O   Tsitsilin Andrey A   Lavrentyev Mikhail M  

Scientific reports 20181119 1


A dataset of liquid chromatography-mass spectrometry measurements of medicinal plant extracts from 74 species was generated and used for training and validating plant species identification algorithms. Various strategies for data handling and feature space extraction were tested. Constrained Tucker decomposition, large-scale (more than 1500 variables) discrete Bayesian Networks and autoencoder based dimensionality reduction coupled with continuous Bayes classifier and logistic regression were op  ...[more]

Similar Datasets

2013-01-22 | GSE39057 | GEO
2013-01-22 | GSE39040 | GEO
2013-01-22 | GSE39055 | GEO
2013-01-22 | GSE39052 | GEO
2013-01-22 | E-GEOD-39040 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39052 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39055 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39057 | biostudies-arrayexpress
2021-10-26 | ST002015 | MetabolomicsWorkbench
2022-04-04 | GSE199668 | GEO