TY - GEN

T1 - Clustering Permutations

T2 - 14th Innovations in Theoretical Computer Science Conference, ITCS 2023

AU - Chakraborty, Diptarka

AU - Das, Debarati

AU - Krauthgamer, Robert

N1 - Funding Information:
Funding Diptarka Chakraborty: Work partially supported by an MoE AcRF Tier 2 grant (WBS No. A-8000416-00-00) and an NUS ODPRT grant (WBS No. A-0008078-00-00). Robert Krauthgamer: Work partially supported by ONR Award N00014-18-1-2364, the Israel Science Foundation grant #1086/18, and a Minerva Foundation grant, and by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center.
Publisher Copyright:
© Diptarka Chakraborty, Debarati Das, and Robert Krauthgamer; licensed under Creative Commons License CC-BY 4.0.

PY - 2023/1/1

Y1 - 2023/1/1

N2 - We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2 − δ)-approximation algorithm for k = 1, where δ ≈ 2−40. Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log(nd))O(k)nd3 for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

AB - We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2 − δ)-approximation algorithm for k = 1, where δ ≈ 2−40. Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log(nd))O(k)nd3 for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

UR - http://www.scopus.com/inward/record.url?scp=85147540331&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85147540331&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.ITCS.2023.31

DO - 10.4230/LIPIcs.ITCS.2023.31

M3 - Conference contribution

AN - SCOPUS:85147540331

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 14th Innovations in Theoretical Computer Science Conference, ITCS 2023

A2 - Kalai, Yael Tauman

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

Y2 - 10 January 2023 through 13 January 2023

ER -