Breaking the Two Approximation Barrier for Various Consensus Clustering Problems

Debarati Das, Amit Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Consensus clustering is a method used to combine the results of multiple clustering algorithms applied to the same dataset. The goal is to find a single clustering solution that represents the common structure present in the various input clusterings, thereby improving the overall quality and robustness of the clustering outcome. Given a set of clusterings on a fixed dataset, the problem of finding a clustering that minimizes the total number of pairwise disagreements with the input clusterings is known as the median partition problem. While Wakabayashi [Resenhas 1998] showed that computing an exact median partition is NP-hard, Ailon et al. [STOC 2005] presented a 4/3-approximation algorithm for this problem. In this work, we provide a significant generalization of the median partition problem. Here, rather than a set of partitions, the input consists of several metric spaces on a fixed set. The goal is to identify a metric space that minimizes the total distance to the input metric spaces. We demonstrate that if the collection of allowed metric spaces adheres to a specific proximity property, a (2-ζ) approximation, where ζ > 0 is a constant, can be achieved for this problem. As a direct consequence of our result, we get a strictly < 2 approximation for the following problem: given a set of ultrametrics (or tree metrics), determine an ultrametric that minimizes the total distance from the input ultrametrics. This essentially addresses the open problem posed by Ailon and Charikar [FOCS 2005] for the setting of ultrametrics and tree metrics. Next, we study the more non-trivial 1-center problem in the context of consensus clustering. In this problem, given m clusterings defined on a set of n elements, the goal is to find a clustering that minimizes the maximum number of pairwise disagreements to the given input clusterings. Unlike its 1-median counterpart, nothing better than a folklore 2-approximation is known for this NP-hard problem. This limitation mainly arises from its sensitivity to outliers. The most significant contribution of this work presents the first approximation algorithm that breaks the 2-approximation barrier for the 1-center consensus clustering problem. We remark that this algorithm introduces several fundamental tools and techniques that we believe have wide applicability, particularly in providing below 2-approximation for the 1-center problem across various related metrics, including ranking problems such as Kendall tau, Ulam distance, Spearman's Footrule, and others.

Original languageEnglish (US)
Title of host publicationAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025
PublisherAssociation for Computing Machinery
Pages323-372
Number of pages50
ISBN (Electronic)9798331312008
StatePublished - 2025
Event36th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025 - New Orleans, United States
Duration: Jan 12 2025Jan 15 2025

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
Volume1
ISSN (Print)1071-9040
ISSN (Electronic)1557-9468

Conference

Conference36th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025
Country/TerritoryUnited States
CityNew Orleans
Period1/12/251/15/25

All Science Journal Classification (ASJC) codes

  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'Breaking the Two Approximation Barrier for Various Consensus Clustering Problems'. Together they form a unique fingerprint.

Cite this