{"title":"Document Listing on Repetitive Collections with Guaranteed Performance","authors":"G. Navarro","doi":"10.4230/LIPIcs.CPM.2017.4","DOIUrl":null,"url":null,"abstract":"We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,a] is composed of D copies of a string of size n, and s single-character edits are applied on the copies. We introduce the first document listing index with size O~(n + s), precisely O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the ndoc strings where it appears in time O(m^2 + m lg N (lg D + lg^e N) ndoc), for any constant e > 0.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2017.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,a] is composed of D copies of a string of size n, and s single-character edits are applied on the copies. We introduce the first document listing index with size O~(n + s), precisely O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the ndoc strings where it appears in time O(m^2 + m lg N (lg D + lg^e N) ndoc), for any constant e > 0.
我们考虑字符串集合上的文档列表,即查找给定模式出现在哪个字符串中。我们特别关注重复集合:一个大小为N /字母表[1,a]的集合由大小为N的字符串的D个副本组成,并且在副本上应用了s个单字符编辑。我们引入第一个文档列表索引,其大小为O~(n + s),精确地为O((n lg a + s lg^2 n) lg D)位,并具有有用的最坏情况时间保证:给定长度为m的模式,索引报告在时间为O(m^2 + m lg n (lg D + lg^e n) ndoc)时出现的ndoc字符串,对于任何常数e > 0。