Abstract
In model-based clustering, the density of each cluster is usually assumed to be a certain basic parametric distribution, for example, the normal distribution. In practice, it is often difficult to decide which parametric distribution is suitable to characterize a cluster, especially for multivariate data. Moreover, the densities of individual clusters may be multimodal themselves, and therefore cannot be accurately modeled by basic parametric distributions. This article explores a clustering approach that models each cluster by a mixture of normals. The resulting overall model is a multilayer mixture of normals. Algorithms to estimate the model and perform clustering are developed based on the classification maximum likelihood (CML) and mixture maximum likelihood (MML) criteria. BIC and ICL-BIC are examined for choosing the number of normal components per cluster. Experiments on both simulated and real data are presented.
Original language | English (US) |
---|---|
Pages (from-to) | 547-568 |
Number of pages | 22 |
Journal | Journal of Computational and Graphical Statistics |
Volume | 14 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2005 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty