The discrete distribution as a sparse representation, equipped with the Kantorovich-Wasserstein metric, has been proven effective in learning tasks on imagery data. However, clustering based on the Kantorovich metric under a principled optimization criterion is computationally challenging, and has not been adequately explored. In this paper, we focus on the scalability issue and develop a new algorithm for clustering distributions. An optimal centroid or representative distribution in the sense of the Kantorovich metric is solved for each cluster. The key idea is to adapt the state-of-the-art distributed optimization approach called alternating direction method of multipliers (ADMM). The new algorithm achieves linear complexity in the update of each centroid and can be easily parallelizable, improving significantly over the existing method. It is also observed that in practice, satisfactory results can be obtained after a few tens of iterations. We conduct experiments on both synthetic and real data to demonstrate the computational efficiency and accuracy of the new algorithm.