{"title":"The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time","authors":"Shay Golan, T. Kociumaka, T. Kopelowitz, E. Porat","doi":"10.4230/LIPIcs.CPM.2020.15","DOIUrl":null,"url":null,"abstract":"We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\\tilde O(k)$ space and $\\tilde O\\big(\\sqrt k\\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\\tilde O(n\\sqrt k)$, and the fastest known offline algorithm, which costs $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$ time. Moreover, it is not known whether improvements over the $\\tilde O(n\\sqrt k)$ total time are possible when using more than $O(k)$ space. \nWe address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\\le s \\le m$, uses $\\tilde O(s)$ space and costs $\\tilde O\\big(n+\\min\\big(\\frac {nk^2}m,\\frac{nk}{\\sqrt s},\\frac{\\sigma nm}s\\big)\\big)$ total time. For $s=m$, the total runtime becomes $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\\tilde O\\big(\\sqrt k\\big)$.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space.
We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.