Weighted Edit Distance Computation: Strings, Trees, and Dyck

Debarati Das, Jacob Gilbert, Mohammad Taghi Hajiaghayi, Tomasz Kociumaka, Barna Saha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Given two strings of length n over alphabet ς, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost forty years back computes the unweighted string edit distance in O(n+k2) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar Õ(n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations (insertions, deletions, and substitutions), and the weights may vary with the characters being edited. Given a weight function w : ς {ϵ} × ς {ϵ} → ≥ 0 (such that w(a,a) = 0 and w(a,b) ≥ 1 for all a, b ς {ϵ} with a b), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla O(n2)-time dynamic-programming algorithm and its almost trivial O(nk)-time implementation (k being an upper bound on the sought total weight), none of the aforementioned developments on the unweighted edit distance applies to the weighted variant. In this paper, we propose the first O(n+poly(k))-time algorithm that computes the weighted string edit distance exactly, thus bridging a fundamental decades-old gap between our understanding of unweighted and weighted edit distance. We then generalize this result to the weighted tree and Dyck edit distances, bringing in several new techniques, which lead to a deterministic algorithm that improves upon the previous work even for unweighted tree edit distance. Given how fundamental weighted edit distance is, we believe our O(n+poly(k))-time algorithm will be instrumental for further significant developments in the area.

Original languageEnglish (US)
Title of host publicationSTOC 2023 - Proceedings of the 55th Annual ACM Symposium on Theory of Computing
EditorsBarna Saha, Rocco A. Servedio
PublisherAssociation for Computing Machinery
Pages377-390
Number of pages14
ISBN (Electronic)9781450399135
DOIs
StatePublished - Jun 2 2023
Event55th Annual ACM Symposium on Theory of Computing, STOC 2023 - Orlando, United States
Duration: Jun 20 2023Jun 23 2023

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
ISSN (Print)0737-8017

Conference

Conference55th Annual ACM Symposium on Theory of Computing, STOC 2023
Country/TerritoryUnited States
CityOrlando
Period6/20/236/23/23

All Science Journal Classification (ASJC) codes

  • Software

Cite this