使用一个通配符的参数化文本索引

A. Ganguly, W. Hon, Yu-An Huang, S. Pissis, R. Shah, Sharma V. Thankachan
{"title":"使用一个通配符的参数化文本索引","authors":"A. Ganguly, W. Hon, Yu-An Huang, S. Pissis, R. Shah, Sharma V. Thankachan","doi":"10.1109/DCC.2019.00023","DOIUrl":null,"url":null,"abstract":"Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i ∊ [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character ϕ ∉ Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P_1ϕP_2, our task is to report all positions i in T, such that the string P_1 P_2 and the string obtained by concatenating T[i,(i+|P_1|-1)] and T[(i+|P_1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).","PeriodicalId":167723,"journal":{"name":"2019 Data Compression Conference (DCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Parameterized Text Indexing with One Wildcard\",\"authors\":\"A. Ganguly, W. Hon, Yu-An Huang, S. Pissis, R. Shah, Sharma V. Thankachan\",\"doi\":\"10.1109/DCC.2019.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i ∊ [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character ϕ ∉ Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P_1ϕP_2, our task is to report all positions i in T, such that the string P_1 P_2 and the string obtained by concatenating T[i,(i+|P_1|-1)] and T[(i+|P_1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).\",\"PeriodicalId\":167723,\"journal\":{\"name\":\"2019 Data Compression Conference (DCC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Data Compression Conference (DCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2019.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2019.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在大小为Σ的字母表Σ上,两个长度相等的字符串X和Y是参数化匹配,如果X可以通过将字符X[i]重命名为字符Y[i]来转换为Y,对于1≤i≤|X|,使用从X中的字符集到Y中的字符集的一对一函数。参数化文本索引问题定义为:在一个大小为Σ的字母表集Σ上索引一个包含n个字符的文本T,使得当模式P[1, P]作为查询出现时,我们可以报告P在T中出现的所有occ个参数化次数。位置i [1, n]是P在T中出现的参数化次数,如果P和T[i,(i+ P -1)]是参数化匹配。我们研究了这个问题的一个有趣的推广,其中模式包含一个通配符φ∈Σ,该通配符与Σ中的任何其他字符匹配。因此,对于模式P[1, P] = P_1ϕP_2,我们的任务是报告T中的所有位置i,使得字符串P_1 P_2和通过连接T[i,(i+|P_1|-1)]和T[(i+|P_1|+1),(i+ P -1)]得到的字符串是参数化匹配。我们证明了这样的查询可以使用O(n log n)空间索引在每次查询的最优O(p+occ)时间内得到回答。然后,我们将展示如何将索引压缩到O(n logσ)空间,但查询成本更高,为O(p(log log n+logσ)+occ logσ)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parameterized Text Indexing with One Wildcard
Two equal-length strings X and Y over an alphabet Σ of size σ are a parameterized match iff X can be transformed to Y by renaming the character X[i] to the character Y[i] for 1 ≤ i ≤ |X| using a one-to-one function from the set of characters in X to the set of characters in Y. The parameterized text indexing problem is defined as: Index a text T of n characters over an alphabet set Σ of size σ, such that whenever a pattern P[1, p] comes as a query, we can report all occ parameterized occurrences of P in T. A position i ∊ [1, n] is a parameterized occurrence of P in T, iff P and T[i,(i+p-1)] are a parameterized match. We study an interesting generalization of this problem, where the pattern contains one wildcard character ϕ ∉ Σ that matches with any other character in Σ. Therefore, for a pattern P[1, p] = P_1ϕP_2, our task is to report all positions i in T, such that the string P_1 P_2 and the string obtained by concatenating T[i,(i+|P_1|-1)] and T[(i+|P_1|+1),(i+p-1)] are a parameterized match. We show that such queries can be answered in optimal O(p+occ) time per query using an O(n log n) space index. We then show how to compress our index into O(n log σ) space but with a higher query cost of O(p(log log n+logσ)+occ logσ).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信