基于语法压缩文本的卷积计算

2013 Data Compression Conference Pub Date : 2013-03-15 DOI:10.1109/DCC.2013.53

Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda

{"title":"基于语法压缩文本的卷积计算","authors":"Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.1109/DCC.2013.53","DOIUrl":null,"url":null,"abstract":"The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.","PeriodicalId":388717,"journal":{"name":"2013 Data Compression Conference","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Computing Convolution on Grammar-Compressed Text\",\"authors\":\"Toshiya Tanaka, T. I., Shunsuke Inenaga, H. Bannai, M. Takeda\",\"doi\":\"10.1109/DCC.2013.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.\",\"PeriodicalId\":388717,\"journal\":{\"name\":\"2013 Data Compression Conference\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2013.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2013.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

长度为N的文本字符串S与长度为m的模式字符串P之间的卷积可以通过FFT在O(N log m)时间内计算出来。已知各种类型的近似字符串匹配问题都可约化为卷积。在本文中，我们假设输入文本字符串以压缩形式给出，作为直线程序(SLP)，这是乔姆斯基范式中的上下文无关语法，派生单个字符串。给定一个大小为n的SLP S，描述一个长度为n的文本S，以及一个长度为m的未压缩模式P，我们提出了一个简单的O(nm log m)时间算法来计算S与P之间的卷积。然后我们证明这可以改进到O(min{nm, n - α} log m)时间，其中α≥0是表示SLP捕获的冗余量的值相对于长度为m的子串。改进的关键是我们的新算法，它在O(r log m)时间内计算大小为r的树与长度为m的模式字符串P之间的卷积。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Computing Convolution on Grammar-Compressed Text

The convolution between a text string S of length N and a pattern string P of length m can be computed in O(N log m) time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a straight-line program (SLP), which is a context free grammar in the Chomsky normal form that derives a single string. Given an SLP S of size n describing a text S of length N, and an uncompressed pattern P of length m, we present a simple O(nm log m)-time algorithm to compute the convolution between S and P. We then show that this can be improved to O(min{nm, N - α} log m) time, where α ≥ 0 is a value that represents the amount of redundancy that the SLP captures with respect to the length-m substrings. The key of the improvement is our new algorithm that computes the convolution between a trie of size r and a pattern string P of length m in O(r log m) time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Data Compression Conference

自引率

0.00%

发文量