{"title":"流k不匹配问题:空间和总时间之间的权衡","authors":"Shay Golan, T. Kociumaka, T. Kopelowitz, E. Porat","doi":"10.4230/LIPIcs.CPM.2020.15","DOIUrl":null,"url":null,"abstract":"We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\\tilde O(k)$ space and $\\tilde O\\big(\\sqrt k\\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\\tilde O(n\\sqrt k)$, and the fastest known offline algorithm, which costs $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$ time. Moreover, it is not known whether improvements over the $\\tilde O(n\\sqrt k)$ total time are possible when using more than $O(k)$ space. \nWe address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\\le s \\le m$, uses $\\tilde O(s)$ space and costs $\\tilde O\\big(n+\\min\\big(\\frac {nk^2}m,\\frac{nk}{\\sqrt s},\\frac{\\sigma nm}s\\big)\\big)$ total time. For $s=m$, the total runtime becomes $\\tilde O\\big(n + \\min\\big(\\frac{nk}{\\sqrt m},\\sigma n\\big)\\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\\tilde O\\big(\\sqrt k\\big)$.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time\",\"authors\":\"Shay Golan, T. Kociumaka, T. Kopelowitz, E. Porat\",\"doi\":\"10.4230/LIPIcs.CPM.2020.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\\\\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\\\\tilde O(k)$ space and $\\\\tilde O\\\\big(\\\\sqrt k\\\\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\\\\tilde O(n\\\\sqrt k)$, and the fastest known offline algorithm, which costs $\\\\tilde O\\\\big(n + \\\\min\\\\big(\\\\frac{nk}{\\\\sqrt m},\\\\sigma n\\\\big)\\\\big)$ time. Moreover, it is not known whether improvements over the $\\\\tilde O(n\\\\sqrt k)$ total time are possible when using more than $O(k)$ space. \\nWe address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\\\\le s \\\\le m$, uses $\\\\tilde O(s)$ space and costs $\\\\tilde O\\\\big(n+\\\\min\\\\big(\\\\frac {nk^2}m,\\\\frac{nk}{\\\\sqrt s},\\\\frac{\\\\sigma nm}s\\\\big)\\\\big)$ total time. For $s=m$, the total runtime becomes $\\\\tilde O\\\\big(n + \\\\min\\\\big(\\\\frac{nk}{\\\\sqrt m},\\\\sigma n\\\\big)\\\\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\\\\tilde O\\\\big(\\\\sqrt k\\\\big)$.\",\"PeriodicalId\":236737,\"journal\":{\"name\":\"Annual Symposium on Combinatorial Pattern Matching\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Symposium on Combinatorial Pattern Matching\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.CPM.2020.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2020.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$\sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space.
We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.