## Abstract

We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(n^{α}), and "FAR" if their edit distance is Ω(n), where α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(n^{max{α/2,2α-1}}) for any fixed α < 1. Our algorithm thus provides a trade-off between accuracy and efficiency that is particularly useful when the input data is very large. We also show a lower bound of Ω(n^{α/2}) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most n^{α} from those with edit distance at least n/6.

Original language | English (US) |
---|---|

Pages (from-to) | 316-324 |

Number of pages | 9 |

Journal | Conference Proceedings of the Annual ACM Symposium on Theory of Computing |

DOIs | |

State | Published - 2003 |

Event | 35th Annual ACM Symposium on Theory of Computing - San Diego, CA, United States Duration: Jun 9 2003 → Jun 11 2003 |

## All Science Journal Classification (ASJC) codes

- Software