TY - GEN
T1 - Breaking the Two Approximation Barrier for Various Consensus Clustering Problems
AU - Das, Debarati
AU - Kumar, Amit
N1 - Publisher Copyright:
Copyright © 2025 by SIAM.
PY - 2025
Y1 - 2025
N2 - Consensus clustering is a method used to combine the results of multiple clustering algorithms applied to the same dataset. The goal is to find a single clustering solution that represents the common structure present in the various input clusterings, thereby improving the overall quality and robustness of the clustering outcome. Given a set of clusterings on a fixed dataset, the problem of finding a clustering that minimizes the total number of pairwise disagreements with the input clusterings is known as the median partition problem. While Wakabayashi [Resenhas 1998] showed that computing an exact median partition is NP-hard, Ailon et al. [STOC 2005] presented a 4/3-approximation algorithm for this problem. In this work, we provide a significant generalization of the median partition problem. Here, rather than a set of partitions, the input consists of several metric spaces on a fixed set. The goal is to identify a metric space that minimizes the total distance to the input metric spaces. We demonstrate that if the collection of allowed metric spaces adheres to a specific proximity property, a (2-ζ) approximation, where ζ > 0 is a constant, can be achieved for this problem. As a direct consequence of our result, we get a strictly < 2 approximation for the following problem: given a set of ultrametrics (or tree metrics), determine an ultrametric that minimizes the total distance from the input ultrametrics. This essentially addresses the open problem posed by Ailon and Charikar [FOCS 2005] for the setting of ultrametrics and tree metrics. Next, we study the more non-trivial 1-center problem in the context of consensus clustering. In this problem, given m clusterings defined on a set of n elements, the goal is to find a clustering that minimizes the maximum number of pairwise disagreements to the given input clusterings. Unlike its 1-median counterpart, nothing better than a folklore 2-approximation is known for this NP-hard problem. This limitation mainly arises from its sensitivity to outliers. The most significant contribution of this work presents the first approximation algorithm that breaks the 2-approximation barrier for the 1-center consensus clustering problem. We remark that this algorithm introduces several fundamental tools and techniques that we believe have wide applicability, particularly in providing below 2-approximation for the 1-center problem across various related metrics, including ranking problems such as Kendall tau, Ulam distance, Spearman's Footrule, and others.
AB - Consensus clustering is a method used to combine the results of multiple clustering algorithms applied to the same dataset. The goal is to find a single clustering solution that represents the common structure present in the various input clusterings, thereby improving the overall quality and robustness of the clustering outcome. Given a set of clusterings on a fixed dataset, the problem of finding a clustering that minimizes the total number of pairwise disagreements with the input clusterings is known as the median partition problem. While Wakabayashi [Resenhas 1998] showed that computing an exact median partition is NP-hard, Ailon et al. [STOC 2005] presented a 4/3-approximation algorithm for this problem. In this work, we provide a significant generalization of the median partition problem. Here, rather than a set of partitions, the input consists of several metric spaces on a fixed set. The goal is to identify a metric space that minimizes the total distance to the input metric spaces. We demonstrate that if the collection of allowed metric spaces adheres to a specific proximity property, a (2-ζ) approximation, where ζ > 0 is a constant, can be achieved for this problem. As a direct consequence of our result, we get a strictly < 2 approximation for the following problem: given a set of ultrametrics (or tree metrics), determine an ultrametric that minimizes the total distance from the input ultrametrics. This essentially addresses the open problem posed by Ailon and Charikar [FOCS 2005] for the setting of ultrametrics and tree metrics. Next, we study the more non-trivial 1-center problem in the context of consensus clustering. In this problem, given m clusterings defined on a set of n elements, the goal is to find a clustering that minimizes the maximum number of pairwise disagreements to the given input clusterings. Unlike its 1-median counterpart, nothing better than a folklore 2-approximation is known for this NP-hard problem. This limitation mainly arises from its sensitivity to outliers. The most significant contribution of this work presents the first approximation algorithm that breaks the 2-approximation barrier for the 1-center consensus clustering problem. We remark that this algorithm introduces several fundamental tools and techniques that we believe have wide applicability, particularly in providing below 2-approximation for the 1-center problem across various related metrics, including ranking problems such as Kendall tau, Ulam distance, Spearman's Footrule, and others.
UR - https://www.scopus.com/pages/publications/85216423735
UR - https://www.scopus.com/inward/citedby.url?scp=85216423735&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85216423735
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 323
EP - 372
BT - Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025
PB - Association for Computing Machinery
T2 - 36th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025
Y2 - 12 January 2025 through 15 January 2025
ER -