Dataset Information

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

ABSTRACT: BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. RESULTS: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. CONCLUSION: Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.

SUBMITTER: van den Berg RA

PROVIDER: S-EPMC1534033 | biostudies-literature | 2006

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

van den Berg Robert A RA Hoefsloot Huub C J HC Westerhuis Johan A JA Smilde Age K AK van der Werf Mariët J MJ

BMC genomics 20060608

<h4>Background</h4>Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pret ...[more]

PMID: 16762068

Dataset Information

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Publications

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

New scaling relation for information transfer in biological networks.
| S-EPMC4707865 | biostudies-literature

Entropy-scaling search of massive biological data.
| S-EPMC4591002 | biostudies-literature

Information content-based Gene Ontology functional similarity measures: which one to use for a given biological data type?
| S-EPMC4256219 | biostudies-literature

Compressing atmospheric data into its real information content
| S-EPMC10766530 | biostudies-literature

Improving CLIP-seq data analysis by incorporating transcript information.
| S-EPMC7745353 | biostudies-literature

Information Integration from Semantically Heterogeneous Biological Data Sources.
| S-EPMC2929138 | biostudies-literature

Adaptive informatics for multifactorial and high-content biological data.
| S-EPMC3105758 | biostudies-literature

Scaling and shear transformations capture beak shape variation in Darwin's finches.
| S-EPMC2840476 | biostudies-literature

Scaling theory for information networks.
| S-EPMC2607348 | biostudies-literature

Temporal scaling in information propagation.
| S-EPMC4061555 | biostudies-literature