Clustering Permutations: New Techniques with Streaming Applications

Diptarka Chakraborty, Debarati Das, Robert Krauthgamer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2 − δ)-approximation algorithm for k = 1, where δ ≈ 2−40. Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log(nd))O(k)nd3 for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

Original languageEnglish (US)
Title of host publication14th Innovations in Theoretical Computer Science Conference, ITCS 2023
EditorsYael Tauman Kalai
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959772631
DOIs
StatePublished - Jan 1 2023
Event14th Innovations in Theoretical Computer Science Conference, ITCS 2023 - Cambridge, United States
Duration: Jan 10 2023Jan 13 2023

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume251
ISSN (Print)1868-8969

Conference

Conference14th Innovations in Theoretical Computer Science Conference, ITCS 2023
Country/TerritoryUnited States
CityCambridge
Period1/10/231/13/23

All Science Journal Classification (ASJC) codes

  • Software

Cite this