A new variable selection algorithm is developed for clustering based on mode association. In conventional mixture-model-based clustering, each mixture component is treated as one cluster and the separation between clusters is usually measured by the ratio of between- and within-component dispersion. In this article, we allow one cluster to contain several components depending on whether theymerge into one mode. The extent of separation between clusters is quantified using critical points on the ridgeline between two modes, which reflects the exact geometry of the density function. The computational foundation consists of the recently developed Modal expectation-maximization (MEM) algorithm which solves the modes of a Gaussian mixture density, and the Ridgeline expectation-maximization (REM) algorithm which solves the ridgeline passing through the critical points of the mixed density of two unimode clusters. Forward selection is used to find a subset of variables that maximizes an aggregated index of pairwise cluster separability. Theoretical analysis of the procedure is provided. We experiment with both simulated and real datasets and compare with several state-of-the-art variable selection algorithms. Supplemental materials including an R-package, datasets, and appendices for proofs are available online.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty