Unknown

Dataset Information

0

Euclidean distance-optimized data transformation for cluster analysis in biomedical data (EDOtrans)


ABSTRACT:

Background

Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters.

Methods

The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,

Results

A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn’s index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling.

Conclusions

For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The “EDOtrans” R package is available at https://cran.r-project.org/package=EDOtrans.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04769-w.

SUBMITTER: Ultsch A 

PROVIDER: S-EPMC9202178 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6232797 | biostudies-literature
| S-EPMC6179360 | biostudies-literature
| S-EPMC2779860 | biostudies-literature
| S-EPMC9812310 | biostudies-literature
| S-EPMC9977176 | biostudies-literature
| S-EPMC8934446 | biostudies-literature
| S-EPMC4098708 | biostudies-literature
| S-EPMC8605398 | biostudies-literature
| S-EPMC3760213 | biostudies-literature
| S-EPMC4044484 | biostudies-literature