TY - GEN
T1 - Scaling up discrete distribution clustering using ADMM
AU - Ye, Jianbo
AU - Li, Jia
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/1/28
Y1 - 2014/1/28
N2 - The discrete distribution as a sparse representation, equipped with the Kantorovich-Wasserstein metric, has been proven effective in learning tasks on imagery data. However, clustering based on the Kantorovich metric under a principled optimization criterion is computationally challenging, and has not been adequately explored. In this paper, we focus on the scalability issue and develop a new algorithm for clustering distributions. An optimal centroid or representative distribution in the sense of the Kantorovich metric is solved for each cluster. The key idea is to adapt the state-of-the-art distributed optimization approach called alternating direction method of multipliers (ADMM). The new algorithm achieves linear complexity in the update of each centroid and can be easily parallelizable, improving significantly over the existing method. It is also observed that in practice, satisfactory results can be obtained after a few tens of iterations. We conduct experiments on both synthetic and real data to demonstrate the computational efficiency and accuracy of the new algorithm.
AB - The discrete distribution as a sparse representation, equipped with the Kantorovich-Wasserstein metric, has been proven effective in learning tasks on imagery data. However, clustering based on the Kantorovich metric under a principled optimization criterion is computationally challenging, and has not been adequately explored. In this paper, we focus on the scalability issue and develop a new algorithm for clustering distributions. An optimal centroid or representative distribution in the sense of the Kantorovich metric is solved for each cluster. The key idea is to adapt the state-of-the-art distributed optimization approach called alternating direction method of multipliers (ADMM). The new algorithm achieves linear complexity in the update of each centroid and can be easily parallelizable, improving significantly over the existing method. It is also observed that in practice, satisfactory results can be obtained after a few tens of iterations. We conduct experiments on both synthetic and real data to demonstrate the computational efficiency and accuracy of the new algorithm.
UR - http://www.scopus.com/inward/record.url?scp=84949927003&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949927003&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2014.7026066
DO - 10.1109/ICIP.2014.7026066
M3 - Conference contribution
AN - SCOPUS:84949927003
T3 - 2014 IEEE International Conference on Image Processing, ICIP 2014
SP - 5267
EP - 5271
BT - 2014 IEEE International Conference on Image Processing, ICIP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
ER -