Unknown

Dataset Information

0

Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.


ABSTRACT: Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.

SUBMITTER: Lo K 

PROVIDER: S-EPMC3223965 | biostudies-literature | 2012 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

Lo Kenneth K   Gottardo Raphael R  

Statistics and computing 20120101 1


Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor repre  ...[more]

Similar Datasets

| S-EPMC9071306 | biostudies-literature
| S-EPMC5272780 | biostudies-literature
| S-EPMC5517826 | biostudies-literature
| S-EPMC3795830 | biostudies-literature
| S-EPMC7448754 | biostudies-literature
| S-EPMC8099438 | biostudies-literature
| S-EPMC5451954 | biostudies-literature
| S-EPMC6461541 | biostudies-literature
| S-EPMC6114938 | biostudies-literature
| S-EPMC8521567 | biostudies-literature