T. Kida, M. Takeda, A. Shinohara, Masamichi Miyazaki, S. Arikawa
{"title":"Multiple pattern matching in LZW compressed text","authors":"T. Kida, M. Takeda, A. Shinohara, Masamichi Miyazaki, S. Arikawa","doi":"10.1109/DCC.1998.672136","DOIUrl":null,"url":null,"abstract":"We address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick (1975) pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach (see Journal of Computer and System Sciences, vol.52, p.299-307, 1996) finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m/sup 2/+r/sub a/) time using O(n+m/sup 2/) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 69
Abstract
We address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick (1975) pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach (see Journal of Computer and System Sciences, vol.52, p.299-307, 1996) finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m/sup 2/+r/sub a/) time using O(n+m/sup 2/) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.
我们直接解决了在LZW压缩文本中搜索的问题,并通过模拟Aho-Corasick(1975)模式匹配机的移动,提出了一种新的多模式搜索算法。新算法发现了多个模式的所有出现,而Amir, Benson和Farach提出的算法(参见Journal of Computer and System Sciences, vol.52, p.299-307, 1996)只发现了单个模式的第一次出现。新算法在O(n+m/sup 2/+r/sub a/)时间内运行,使用O(n+m/sup 2/)空间,其中n是压缩文本的长度,m是模式的总长度,r是模式出现的次数。我们实现了该算法的一个简单版本,并表明它比使用Aho-Corasick机器进行解压和搜索大约快两倍。