基于语法压缩文本的卷积计算

Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"基于语法压缩文本的卷积计算","authors":"Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.1109/DCC.2013.53","DOIUrl":null,"url":null,"abstract":"The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.","PeriodicalId":388717,"journal":{"name":"2013 Data Compression Conference","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Computing Convolution on Grammar-Compressed Text\",\"authors\":\"Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda\",\"doi\":\"10.1109/DCC.2013.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.\",\"PeriodicalId\":388717,\"journal\":{\"name\":\"2013 Data Compression Conference\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2013.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2013.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

长度为N的文本字符串S与长度为m的模式字符串P之间的卷积可以通过FFT在O(N log m)时间内计算出来。已知各种类型的近似字符串匹配问题都可约化为卷积。在本文中,我们假设输入文本字符串以压缩形式给出,作为直线程序(SLP),这是乔姆斯基范式中的上下文无关语法,派生单个字符串。给定一个大小为n的SLP S,描述一个长度为n的文本S,以及一个长度为m的未压缩模式P,我们提出了一个简单的O(nm log m)时间算法来计算S与P之间的卷积。然后我们证明这可以改进到O(min{nm, n - α} log m)时间,其中α≥0是表示SLP捕获的冗余量的值相对于长度为m的子串。改进的关键是我们的新算法,它在O(r log m)时间内计算大小为r的树与长度为m的模式字符串P之间的卷积。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Computing Convolution on Grammar-Compressed Text
The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信