Dataset Information

Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation.

ABSTRACT:

Background

Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES).

Results

Here, the effects of the glog transform are compared against two commonly used variance stabilising techniques, autoscaling and Pareto scaling, as well as unscaled data. The four methods are evaluated in terms of the effects on the variance of NMR metabolomics data and on the classification accuracy following multivariate analysis, the latter achieved using principal component analysis followed by linear discriminant analysis. For two of three datasets analysed, classification accuracies were highest following glog transformation: 100% accuracy for discriminating 1D NMR spectra of hypoxic and normoxic invertebrate muscle, and 100% accuracy for discriminating 2D JRES spectra of fish livers sampled from two rivers. For the third dataset, pJRES spectra of urine from two breeds of dog, the glog transform and autoscaling achieved equal highest accuracies. Additionally we extended the glog algorithm to effectively suppress noise, which proved critical for the analysis of 2D JRES spectra.

Conclusion

We have demonstrated that the glog and extended glog transforms stabilise the technical variance in NMR metabolomics datasets. This significantly improves the discrimination between sample classes and has resulted in higher classification accuracies compared to unscaled, autoscaled or Pareto scaled data. Additionally we have confirmed the broad applicability of the glog approach using three disparate datasets from different biological samples using 1D NMR spectra, 1D projections of 2D JRES spectra, and intact 2D JRES spectra.

SUBMITTER: Parsons HM

PROVIDER: S-EPMC1965488 | biostudies-literature | 2007 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation.

Parsons Helen M HM Ludwig Christian C Günther Ulrich L UL Viant Mark R MR

BMC bioinformatics 20070702

<h4>Background</h4>Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform ...[more]

PMID: 17605789

Dataset Information

Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation.

Background

Results

Conclusion

Publications

Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Blending Samples to Increase Accuracy and Precision of <sup>1</sup>H NMR Urine Metabolomics.
| S-EPMC11325295 | biostudies-literature

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.
| S-EPMC6856029 | biostudies-literature

Deconvolution of two-dimensional NMR spectra by fast maximum likelihood reconstruction: application to quantitative metabolomics.
| S-EPMC3114465 | biostudies-literature

Bacterial Substrate Transformation Tracked by Stable-Isotope-Guided NMR Metabolomics: Application in a Natural Aquatic Microbial Community.
| S-EPMC5746732 | biostudies-literature

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
| S-EPMC3375876 | biostudies-literature

HoxPred: automated classification of Hox proteins using combinations of generalised profiles.
| S-EPMC1965487 | biostudies-literature

Model-based variance-stabilizing transformation for Illumina microarray data.
| S-EPMC2241869 | biostudies-literature

Sources of Variance in the Accuracy of Interviewer Observations.
| S-EPMC6905517 | biostudies-literature

Error Variance Estimation in Ultrahigh-Dimensional Additive Models.
| S-EPMC6052885 | biostudies-literature

Classification of Raw Stingless Bee Honeys by Bee Species Origins Using the NMR- and LC-MS-Based Metabolomics Approach.
| S-EPMC6225217 | biostudies-literature