{"title":"使用压缩模式的字符串索引","authors":"Philip Bille, Inge Li Gørtz, Teresa Anna Steiner","doi":"https://dl.acm.org/doi/10.1145/3607141","DOIUrl":null,"url":null,"abstract":"<p>Given a string <i>S</i> of length <i>n</i>, the classic string indexing problem is to preprocess <i>S</i> into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 14","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"String Indexing with Compressed Patterns\",\"authors\":\"Philip Bille, Inge Li Gørtz, Teresa Anna Steiner\",\"doi\":\"https://dl.acm.org/doi/10.1145/3607141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Given a string <i>S</i> of length <i>n</i>, the classic string indexing problem is to preprocess <i>S</i> into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.</p>\",\"PeriodicalId\":50922,\"journal\":{\"name\":\"ACM Transactions on Algorithms\",\"volume\":\"7 14\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Algorithms\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3607141\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Algorithms","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3607141","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.
期刊介绍:
ACM Transactions on Algorithms welcomes submissions of original research of the highest quality dealing with algorithms that are inherently discrete and finite, and having mathematical content in a natural way, either in the objective or in the analysis. Most welcome are new algorithms and data structures, new and improved analyses, and complexity results. Specific areas of computation covered by the journal include
combinatorial searches and objects;
counting;
discrete optimization and approximation;
randomization and quantum computation;
parallel and distributed computation;
algorithms for
graphs,
geometry,
arithmetic,
number theory,
strings;
on-line analysis;
cryptography;
coding;
data compression;
learning algorithms;
methods of algorithmic analysis;
discrete algorithms for application areas such as
biology,
economics,
game theory,
communication,
computer systems and architecture,
hardware design,
scientific computing