TY - GEN
T1 - Weighted Edit Distance Computation
T2 - 55th Annual ACM Symposium on Theory of Computing, STOC 2023
AU - Das, Debarati
AU - Gilbert, Jacob
AU - Hajiaghayi, Mohammad Taghi
AU - Kociumaka, Tomasz
AU - Saha, Barna
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/2
Y1 - 2023/6/2
N2 - Given two strings of length n over alphabet ς, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost forty years back computes the unweighted string edit distance in O(n+k2) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar Õ(n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations (insertions, deletions, and substitutions), and the weights may vary with the characters being edited. Given a weight function w : ς {ϵ} × ς {ϵ} → ≥ 0 (such that w(a,a) = 0 and w(a,b) ≥ 1 for all a, b ς {ϵ} with a b), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla O(n2)-time dynamic-programming algorithm and its almost trivial O(nk)-time implementation (k being an upper bound on the sought total weight), none of the aforementioned developments on the unweighted edit distance applies to the weighted variant. In this paper, we propose the first O(n+poly(k))-time algorithm that computes the weighted string edit distance exactly, thus bridging a fundamental decades-old gap between our understanding of unweighted and weighted edit distance. We then generalize this result to the weighted tree and Dyck edit distances, bringing in several new techniques, which lead to a deterministic algorithm that improves upon the previous work even for unweighted tree edit distance. Given how fundamental weighted edit distance is, we believe our O(n+poly(k))-time algorithm will be instrumental for further significant developments in the area.
AB - Given two strings of length n over alphabet ς, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost forty years back computes the unweighted string edit distance in O(n+k2) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC'15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar Õ(n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances. While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations (insertions, deletions, and substitutions), and the weights may vary with the characters being edited. Given a weight function w : ς {ϵ} × ς {ϵ} → ≥ 0 (such that w(a,a) = 0 and w(a,b) ≥ 1 for all a, b ς {ϵ} with a b), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla O(n2)-time dynamic-programming algorithm and its almost trivial O(nk)-time implementation (k being an upper bound on the sought total weight), none of the aforementioned developments on the unweighted edit distance applies to the weighted variant. In this paper, we propose the first O(n+poly(k))-time algorithm that computes the weighted string edit distance exactly, thus bridging a fundamental decades-old gap between our understanding of unweighted and weighted edit distance. We then generalize this result to the weighted tree and Dyck edit distances, bringing in several new techniques, which lead to a deterministic algorithm that improves upon the previous work even for unweighted tree edit distance. Given how fundamental weighted edit distance is, we believe our O(n+poly(k))-time algorithm will be instrumental for further significant developments in the area.
UR - http://www.scopus.com/inward/record.url?scp=85163056020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163056020&partnerID=8YFLogxK
U2 - 10.1145/3564246.3585178
DO - 10.1145/3564246.3585178
M3 - Conference contribution
AN - SCOPUS:85163056020
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 377
EP - 390
BT - STOC 2023 - Proceedings of the 55th Annual ACM Symposium on Theory of Computing
A2 - Saha, Barna
A2 - Servedio, Rocco A.
PB - Association for Computing Machinery
Y2 - 20 June 2023 through 23 June 2023
ER -