Annual Symposium on Combinatorial Pattern Matching最新文献

筛选
英文 中文
MONI can find k-MEMs MONI可以找到k-MEMs
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-02-10 DOI: 10.4230/LIPIcs.CPM.2023.26
T. Gagie
{"title":"MONI can find k-MEMs","authors":"T. Gagie","doi":"10.4230/LIPIcs.CPM.2023.26","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2023.26","url":null,"abstract":"Suppose we are asked to index a text $T [0..n - 1]$ such that, given a pattern $P [0..m - 1]$, we can quickly report the maximal substrings of $P$ that each occur in $T$ at least $k$ times. We first show how we can add $O (r log n)$ bits to Rossi et al.'s recent MONI index, where $r$ is the number of runs in the Burrows-Wheeler Transform of $T$, such that it supports such queries in $O (k m log n)$ time. We then show how, if we are given $k$ at construction time, we can reduce the query time to $O (m log n)$.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Normalized Edit Distance with Uniform Operation Costs is a Metric 具有统一操作成本的归一化编辑距离是一个度量
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-01-16 DOI: 10.4230/LIPIcs.CPM.2022.17
D. Fisman, Joshua Grogin, Oded Margalit, Gera Weiss
{"title":"The Normalized Edit Distance with Uniform Operation Costs is a Metric","authors":"D. Fisman, Joshua Grogin, Oded Margalit, Gera Weiss","doi":"10.4230/LIPIcs.CPM.2022.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2022.17","url":null,"abstract":"We prove that the normalized edit distance proposed in [Marzal and Vidal 1993] is a metric when the cost of all the edit operations are the same. This closes a long standing gap in the literature where several authors noted that this distance does not satisfy the triangle inequality in the general case, and that it was not known whether it is satisfied in the uniform case – where all the edit costs are equal. We compare this metric to two normalized metrics proposed as alternatives in the literature, when people thought that Marzal’s and Vidal’s distance is not a metric, and identify key properties that explain why the original distance, now known to also be a metric, is better for some applications. Our examination is from a point of view of formal verification, but the properties and their significance are stated in an application agnostic way.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123238484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Arbitrary-length analogs to de Bruijn sequences 与德布鲁因序列类似的任意长度序列
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2021-08-17 DOI: 10.4230/LIPIcs.CPM.2022.9
Abhinav Nellore, Rachel A. Ward
{"title":"Arbitrary-length analogs to de Bruijn sequences","authors":"Abhinav Nellore, Rachel A. Ward","doi":"10.4230/LIPIcs.CPM.2022.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2022.9","url":null,"abstract":"Let $widetilde{alpha}$ be a length-$L$ cyclic sequence of characters from a size-$K$ alphabet $mathcal{A}$ such that the number of occurrences of any length-$m$ string on $mathcal{A}$ as a substring of $widetilde{alpha}$ is $lfloor L / K^m rfloor$ or $lceil L / K^m rceil$. When $L = K^N$ for any positive integer $N$, $widetilde{alpha}$ is a de Bruijn sequence of order $N$, and when $L neq K^N$, $widetilde{alpha}$ shares many properties with de Bruijn sequences. We describe an algorithm that outputs some $widetilde{alpha}$ for any combination of $K geq 2$ and $L geq 1$ in $O(L)$ time using $O(L log K)$ space. This algorithm extends Lempel's recursive construction of a binary de Bruijn sequence. An implementation written in Python is available at https://github.com/nelloreward/pkl.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122336978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ranking Bracelets in Polynomial Time 多项式时间排序手镯
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2021-04-09 DOI: 10.4230/LIPIcs.CPM.2021.4
Duncan Adamson, Argyrios Deligkas, V. Gusev, I. Potapov
{"title":"Ranking Bracelets in Polynomial Time","authors":"Duncan Adamson, Argyrios Deligkas, V. Gusev, I. Potapov","doi":"10.4230/LIPIcs.CPM.2021.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2021.4","url":null,"abstract":"The main result of the paper is the first polynomial-time algorithm for ranking bracelets. The time-complexity of the algorithm is O(k^2 n^4), where k is the size of the alphabet and n is the length of the considered bracelets. The key part of the algorithm is to compute the rank of any word with respect to the set of bracelets by finding three other ranks: the rank over all necklaces, the rank over palindromic necklaces, and the rank over enclosing apalindromic necklaces. The last two concepts are introduced in this paper. These ranks are key components to our algorithm in order to decompose the problem into parts. Additionally, this ranking procedure is used to build a polynomial-time unranking algorithm.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs 一种构造分层重叠图的线性时间算法
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2021-02-25 DOI: 10.4230/LIPIcs.CPM.2021.22
Sangsoo Park, Sung Gwan Park, Bastien Cazaux, Kunsoo Park, Eric Rivals
{"title":"A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs","authors":"Sangsoo Park, Sung Gwan Park, Bastien Cazaux, Kunsoo Park, Eric Rivals","doi":"10.4230/LIPIcs.CPM.2021.22","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2021.22","url":null,"abstract":"The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n²).","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127920748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Revisiting the Parameterized Complexity of Maximum-Duo Preservation String Mapping 重新审视最大二保存字符串映射的参数化复杂度
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-12-01 DOI: 10.4230/LIPIcs.CPM.2017.11
Christian Komusiewicz, Mateus de Oliveira Oliveira, M. Zehavi
{"title":"Revisiting the Parameterized Complexity of Maximum-Duo Preservation String Mapping","authors":"Christian Komusiewicz, Mateus de Oliveira Oliveira, M. Zehavi","doi":"10.4230/LIPIcs.CPM.2017.11","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.11","url":null,"abstract":"Abstract In the Maximum-Duo Preservation String Mapping ( Max-Duo PSM ) problem, the input consists of two related strings A and B of length n and a nonnegative integer k. The objective is to determine whether there exists a mapping m from the set of positions of A to the set of positions of B that maps only to positions with the same character and preserves at least k duos, which are pairs of adjacent positions. We develop a randomized algorithm that solves Max-Duo PSM in 4 k ⋅ n O ( 1 ) time, and a deterministic algorithm that solves this problem in 6.855 k ⋅ n O ( 1 ) time. The previous best known (deterministic) algorithm for this problem has ( 8 e ) 2 k + o ( k ) ⋅ n O ( 1 ) running time [Beretta et al. (2016) [1] , [2] ]. We also show that Max-Duo PSM admits a problem kernel of size O ( k 3 ) , improving upon the previous best known problem kernel of size O ( k 6 ) .","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115421742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
AWLCO: All-Window Length Co-Occurrence AWLCO:全窗口长度共现
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-11-29 DOI: 10.4230/LIPIcs.CPM.2021.24
Joshua Sobel, Noah Bertram, C. Ding, F. Nargesian, D. Gildea
{"title":"AWLCO: All-Window Length Co-Occurrence","authors":"Joshua Sobel, Noah Bertram, C. Ding, F. Nargesian, D. Gildea","doi":"10.4230/LIPIcs.CPM.2021.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2021.24","url":null,"abstract":"Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to windows of all lengths. We study the exact co-occurrence counting problem for the all-window-length analysis model. Our first algorithm is an offline algorithm that counts all-window-length co-occurrences by performing multiple passes over a sequence and computing single-window-length co-occurrences. This algorithm has the time complexity $O(n)$ for each window length and thus a total complexity of $O(n^2)$ and the space complexity $O(|I|)$ for a sequence of size n and an itemset of size $|I|$. We propose AWLCO, an online algorithm that computes all-window-length co-occurrences in a single pass with the expected time complexity of $O(n)$ and space complexity of $O( sqrt{ n|I| })$. Following this, we generalize our use case to patterns in which we propose an algorithm that computes all-window-length co-occurrence with expected time complexity $O(n|I|)$ and space complexity $O( sqrt{n|I|} + e_{max}|I|)$, where $e_{max}$ is the length of the largest pattern.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124608711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Longest Run Subsequence Problem: Further Complexity Results 最长运行子序列问题:进一步的复杂性结果
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-11-16 DOI: 10.4230/LIPIcs.CPM.2021.14
R. Dondi, F. Sikora
{"title":"The Longest Run Subsequence Problem: Further Complexity Results","authors":"R. Dondi, F. Sikora","doi":"10.4230/LIPIcs.CPM.2021.14","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2021.14","url":null,"abstract":"Longest Run Subsequence is a problem introduced recently in the context of the scaffolding phase of genome assembly (Schrinner et al.,WABI 2020). The problem asks for a maximum length subsequence of a given string that contains at most one run for each symbol (a run is a maximum substring of consecutive identical symbols). The problem has been shown to be NP-hard and to be fixed-parameter tractable when the parameter is the size of the alphabet on which the input string is defined. In this paper we further investigate the complexity of the problem and we show that it is fixed-parameter tractable when it is parameterized by the number of runs in a solution, a smaller parameter. Moreover, we investigate the kernelization complexity of Longest Run Subsequence and we prove that it does not admit a polynomial kernel when parameterized by the size of the alphabet or by the number of runs. Finally, we consider the restriction of Longest Run Subsequence when each symbol has at most two occurrences in the input string and we show that it is APX-hard.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
String Sanitization Under Edit Distance: Improved and Generalized 编辑距离下的字符串消毒:改进与推广
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-07-16 DOI: 10.4230/LIPIcs.CPM.2021.19
Takuya Mieno, S. Pissis, L. Stougie, Michelle Sweering
{"title":"String Sanitization Under Edit Distance: Improved and Generalized","authors":"Takuya Mieno, S. Pissis, L. Stougie, Michelle Sweering","doi":"10.4230/LIPIcs.CPM.2021.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2021.19","url":null,"abstract":"Let $W$ be a string of length $n$ over an alphabet $Sigma$, $k$ be a positive integer, and $mathcal{S}$ be a set of length-$k$ substrings of $W$. The ETFS problem asks us to construct a string $X_{mathrm{ED}}$ such that: (i) no string of $mathcal{S}$ occurs in $X_{mathrm{ED}}$; (ii) the order of all other length-$k$ substrings over $Sigma$ is the same in $W$ and in $X_{mathrm{ED}}$; and (iii) $X_{mathrm{ED}}$ has minimal edit distance to $W$. When $W$ represents an individual's data and $mathcal{S}$ represents a set of confidential patterns, the ETFS problem asks for transforming $W$ to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019]. \u0000ETFS can be solved in $mathcal{O}(n^2k)$ time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in $mathcal{O}(n^{2-delta})$ time, for any $delta>0$, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows: (i) an $mathcal{O}(n^2log^2k)$-time algorithm to solve ETFS; and (ii) an $mathcal{O}(n^2log^2n)$-time algorithm to solve AETFS, a generalization of ETFS in which the elements of $mathcal{S}$ can have arbitrary lengths. Our algorithms are thus optimal up to polylogarithmic factors, unless SETH fails. Let us also stress that our algorithms work under edit distance with arbitrary weights at no extra cost. As a bonus, we show how to modify some known techniques, which speed up the standard edit distance computation, to be applied to our problems. Beyond string sanitization, our techniques may inspire solutions to other problems related to regular expressions or context-free grammars.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124765700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
String Sanitization Under Edit Distance 编辑距离下的字符串处理
Annual Symposium on Combinatorial Pattern Matching Pub Date : 2020-06-09 DOI: 10.4230/LIPIcs.CPM.2020.7
G. Bernardini, Huiping Chen, G. Loukides, N. Pisanti, S. Pissis, L. Stougie, Michelle Sweering
{"title":"String Sanitization Under Edit Distance","authors":"G. Bernardini, Huiping Chen, G. Loukides, N. Pisanti, S. Pissis, L. Stougie, Michelle Sweering","doi":"10.4230/LIPIcs.CPM.2020.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2020.7","url":null,"abstract":"textabstractLet W be a string of length n over an alphabet Σ, k be a positive integer, and be a set of length-k substrings of W. The ETFS problem asks us to construct a string X_{ED} such that: (i) no string of occurs in X_{ED}; (ii) the order of all other length-k substrings over Σ is the same in W and in X_{ED}; and (iii) X_{ED} has minimal edit distance to W. When W represents an individual’s data and represents a set of confidential substrings, algorithms solving ETFS can be applied for utility-preserving string sanitization [Bernardini et al., ECML PKDD 2019]. Our first result here is an algorithm to solve ETFS in (kn²) time, which improves on the state of the art [Bernardini et al., arXiv 2019] by a factor of |Σ|. Our algorithm is based on a non-trivial modification of the classic dynamic programming algorithm for computing the edit distance between two strings. Notably, we also show that ETFS cannot be solved in (n^{2-δ}) time, for any δ>0, unless the strong exponential time hypothesis is false. To achieve this, we reduce the edit distance problem, which is known to admit the same conditional lower bound [Bringmann and Kunnemann, FOCS 2015], to ETFS.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116303767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信