TY - JOUR
T1 - Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models
AU - Seo, Beomseok
AU - Lin, Lin
AU - Li, Jia
N1 - Funding Information:
The research is supported by the National Science Foundation under grant number DMS-2013905. We thank the reviewers and editors for detailed and constructive comments.
Publisher Copyright:
© 2021 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
PY - 2022
Y1 - 2022
N2 - Mixture modeling is a major paradigm for clustering in statistics. In this article, we develop a new block-wise variable selection method for clustering by exploiting the latent states of the hidden Markov model on variable blocks or the Gaussian mixture model. The variable blocks are formed by depth-first-search on a dendrogram created based on the mutual information between any pair of variables. It is demonstrated that the latent states of the variable blocks together with the mixture model parameters can represent the original data effectively and much more compactly. We thus cluster the data using the latent states and select variables according to the relationship between the states and the clusters. As true class labels are unknown in the unsupervised setting, we first generate more refined clusters, namely, semi-clusters, for variable selection and then determine the final clusters based on the dimension reduced data. Experiments on simulated and real data show that the new method is highly competitive in terms of clustering accuracy compared with several widely used methods. Supplementary materials for this article are available online.
AB - Mixture modeling is a major paradigm for clustering in statistics. In this article, we develop a new block-wise variable selection method for clustering by exploiting the latent states of the hidden Markov model on variable blocks or the Gaussian mixture model. The variable blocks are formed by depth-first-search on a dendrogram created based on the mutual information between any pair of variables. It is demonstrated that the latent states of the variable blocks together with the mixture model parameters can represent the original data effectively and much more compactly. We thus cluster the data using the latent states and select variables according to the relationship between the states and the clusters. As true class labels are unknown in the unsupervised setting, we first generate more refined clusters, namely, semi-clusters, for variable selection and then determine the final clusters based on the dimension reduced data. Experiments on simulated and real data show that the new method is highly competitive in terms of clustering accuracy compared with several widely used methods. Supplementary materials for this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=85119342539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119342539&partnerID=8YFLogxK
U2 - 10.1080/10618600.2021.1982724
DO - 10.1080/10618600.2021.1982724
M3 - Article
AN - SCOPUS:85119342539
SN - 1061-8600
VL - 31
SP - 138
EP - 150
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 1
ER -