TY - GEN
T1 - Improved Approximation Algorithms for Dyck Edit Distance and RNA Folding
AU - Das, Debarati
AU - Kociumaka, Tomasz
AU - Saha, Barna
N1 - Publisher Copyright:
© Debarati Das, Tomasz Kociumaka, and Barna Saha; licensed under Creative Commons License CC-BY 4.0
PY - 2022/7/1
Y1 - 2022/7/1
N2 - The Dyck language, which consists of well-balanced sequences of parentheses, is one of the most fundamental context-free languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given length-n parenthesis sequence well-balanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and)(are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1 + ϵ)-multiplicative approximation is known with a running time of Ω(n2.372). The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubic-time combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)-factor approximation algorithm that runs in near-linear time (Saha, FOCS'14), whereas for RNA Folding only an ϵn-additive approximation in (Equation presented) time (Saha, FOCS'17) is known. In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constant-factor approximation algorithm that runs in Õ(n1.971) time (the first constant-factor approximation in subquadratic time). Moreover, we develop a (1 + ϵ)-factor approximation algorithm running in (Equation presented) time, which improves upon the earlier additive approximation. Finally, we design a (3 + ϵ)-approximation that takes (Equation presented) time, where d ≥ 1 is an upper bound on the sought distance. As for RNA folding, for any s ≥ 1, we design a factor-s approximation algorithm that runs in O(n + (n/s)3) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n2 barrier. All our algorithms are combinatorial in nature.
AB - The Dyck language, which consists of well-balanced sequences of parentheses, is one of the most fundamental context-free languages. The Dyck edit distance quantifies the number of edits (character insertions, deletions, and substitutions) required to make a given length-n parenthesis sequence well-balanced. RNA Folding involves a similar problem, where a closing parenthesis can match an opening parenthesis of the same type irrespective of their ordering. For example, in RNA Folding, both () and)(are valid matches, whereas the Dyck language only allows () as a match. Both of these problems have been studied extensively in the literature. Using fast matrix multiplication, it is possible to compute their exact solutions in time O(n2.687) (Chi, Duan, Xie, Zhang, STOC'22), and a (1 + ϵ)-multiplicative approximation is known with a running time of Ω(n2.372). The impracticality of fast matrix multiplication often makes combinatorial algorithms much more desirable. Unfortunately, it is known that the problems of (exactly) computing the Dyck edit distance and the folding distance are at least as hard as Boolean matrix multiplication. Thereby, they are unlikely to admit truly subcubic-time combinatorial algorithms. In terms of fast approximation algorithms that are combinatorial in nature, the state of the art for Dyck edit distance is an O(log n)-factor approximation algorithm that runs in near-linear time (Saha, FOCS'14), whereas for RNA Folding only an ϵn-additive approximation in (Equation presented) time (Saha, FOCS'17) is known. In this paper, we make substantial improvements to the state of the art for Dyck edit distance (with any number of parenthesis types). We design a constant-factor approximation algorithm that runs in Õ(n1.971) time (the first constant-factor approximation in subquadratic time). Moreover, we develop a (1 + ϵ)-factor approximation algorithm running in (Equation presented) time, which improves upon the earlier additive approximation. Finally, we design a (3 + ϵ)-approximation that takes (Equation presented) time, where d ≥ 1 is an upper bound on the sought distance. As for RNA folding, for any s ≥ 1, we design a factor-s approximation algorithm that runs in O(n + (n/s)3) time. To the best of our knowledge, this is the first nontrivial approximation algorithm for RNA Folding that can go below the n2 barrier. All our algorithms are combinatorial in nature.
UR - http://www.scopus.com/inward/record.url?scp=85133417992&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133417992&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ICALP.2022.49
DO - 10.4230/LIPIcs.ICALP.2022.49
M3 - Conference contribution
AN - SCOPUS:85133417992
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022
A2 - Bojanczyk, Mikolaj
A2 - Merelli, Emanuela
A2 - Woodruff, David P.
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 49th EATCS International Conference on Automata, Languages, and Programming, ICALP 2022
Y2 - 4 July 2022 through 8 July 2022
ER -