Alexandr Andoni, Robert Krauthgamer, Krzysztof Onak
{"title":"Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity","authors":"Alexandr Andoni, Robert Krauthgamer, Krzysztof Onak","doi":"10.1109/FOCS.2010.43","DOIUrl":null,"url":null,"abstract":"We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $\\eps>0$, the algorithm computes a $(\\log n)^{O(1/\\eps)}$ approximation in $n^{1+\\eps}$ time. This is an {\\em exponential} improvement over the previously known approximation factor, $2^{\\tilde O(\\sqrt{\\log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new \\emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"117","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 117
Abstract
We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor. For strings of length $n$ and every fixed $\eps>0$, the algorithm computes a $(\log n)^{O(1/\eps)}$ approximation in $n^{1+\eps}$ time. This is an {\em exponential} improvement over the previously known approximation factor, $2^{\tilde O(\sqrt{\log n})}$, with a comparable running time [Ostrovsky and Rabani, J. ACM 2007, Andoni and Onak, STOC 2009]. This result arises naturally in the study of a new \emph{asymmetric query} model. In this model, the input consists of two strings $x$ and $y$, and an algorithm can access $y$ in an unrestricted manner, while being charged for querying every symbol of $x$. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being ``repetitive'', which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance.
我们提出了一种近线性时间算法,该算法在多对数因子内近似两个字符串之间的编辑距离。对于长度为$n$的字符串和每个固定的$\eps>0$,算法在$n^{1+\eps}$时间内计算一个$(\log n)^{O(1/\eps)}$近似值。与之前已知的近似因子{\em}$2^{\tilde O(\sqrt{\log n})}$相比,这是一个级的改进,并且运行时间相当[Ostrovsky and Rabani, J. ACM 2007; Andoni and Onak, STOC 2009]。这个结果在研究新的\emph{非对称查询}模型时自然出现。在这个模型中,输入由两个字符串$x$和$y$组成,算法可以不受限制地访问$y$,同时对查询$x$的每个符号收费。实际上,我们通过设计一个算法来获得我们的主要结果,该算法在该模型中进行少量查询。然后,我们提供查询数量的一个几乎匹配的下界。我们的下界首次揭示了编辑距离的硬度,因为输入字符串是“重复的”,这意味着它们的许多子字符串几乎相同。因此,我们的下界提供了编辑距离和Ulam距离之间的第一个严格的分离。