Annual Symposium on Combinatorial Pattern Matching最新文献

From Bit-Parallelism to Quantum String Matching for Labelled Graphs 从位并行到标记图的量子字符串匹配

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2023-02-06 DOI: 10.4230/LIPIcs.CPM.2023.9

Massimo Equi, A. V. D. Griend, V. Mäkinen

{"title":"From Bit-Parallelism to Quantum String Matching for Labelled Graphs","authors":"Massimo Equi, A. V. D. Griend, V. Mäkinen","doi":"10.4230/LIPIcs.CPM.2023.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2023.9","url":null,"abstract":"Many problems that can be solved in quadratic time have bit-parallel speed-ups with factor $w$, where $w$ is the computer word size. A classic example is computing the edit distance of two strings of length $n$, which can be solved in $O(n^2/w)$ time. In a reasonable classical model of computation, one can assume $w=Theta(log n)$, and obtaining significantly better speed-ups is unlikely in the light of conditional lower bounds obtained for such problems. In this paper, we study the connection of bit-parallelism to quantum computation, aiming to see if a bit-parallel algorithm could be converted to a quantum algorithm with better than logarithmic speed-up. We focus on string matching in labeled graphs, the problem of finding an exact occurrence of a string as the label of a path in a graph. This problem admits a quadratic conditional lower bound under a very restricted class of graphs (Equi et al. ICALP 2019), stating that no algorithm in the classical model of computation can solve the problem in time $O(|P||E|^{1-epsilon})$ or $O(|P|^{1-epsilon}|E|)$. We show that a simple bit-parallel algorithm on such restricted family of graphs (level DAGs) can indeed be converted into a realistic quantum algorithm that attains subquadratic time complexity $O(|E|sqrt{|P|})$.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Optimal LZ-End Parsing is Hard 最佳LZ-End解析是困难的

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2023-02-06 DOI: 10.48550/arXiv.2302.02586

H. Bannai, Mitsuru Funakoshi, Kazuhiro Kurita, Yuto Nakashima, Kazuhisa Seto, T. Uno

引用次数: 0

Order-Preserving Squares in Strings 字符串中的保序平方

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2023-02-01 DOI: 10.48550/arXiv.2302.00724

Paweł Gawrychowski, Samah Ghazawi, G. M. Landau

引用次数: 0

Sliding Window String Indexing in Streams 流中的滑动窗口字符串索引

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2023-01-23 DOI: 10.48550/arXiv.2301.09477

P. Bille, J. Fischer, I. L. Gørtz, Max Rishøj Pedersen, Tord Stordalen

{"title":"Sliding Window String Indexing in Streams","authors":"P. Bille, J. Fischer, I. L. Gørtz, Max Rishøj Pedersen, Tord Stordalen","doi":"10.48550/arXiv.2301.09477","DOIUrl":"https://doi.org/10.48550/arXiv.2301.09477","url":null,"abstract":"Given a string $S$ over an alphabet $Sigma$, the 'string indexing problem' is to preprocess $S$ to subsequently support efficient pattern matching queries, i.e., given a pattern string $P$ report all the occurrences of $P$ in $S$. In this paper we study the 'streaming sliding window string indexing problem'. Here the string $S$ arrives as a stream, one character at a time, and the goal is to maintain an index of the last $w$ characters, called the 'window', for a specified parameter $w$. At any point in time a pattern matching query for a pattern $P$ may arrive, also streamed one character at a time, and all occurrences of $P$ within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple $O(w)$ space data structure that uses $O(log w)$ time with high probability to process each character from both the input string $S$ and the pattern string $P$. Reporting each occurrence from $P$ uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream. We also consider a delayed variant of the problem, where a query may be answered at any point within the next $delta$ characters that arrive from either stream. We present an $O(w + delta)$ space data structure for this problem that improves the above time bounds to $O(log(w/delta))$. In particular, for a delay of $delta = epsilon w$ we obtain an $O(w)$ space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"61 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132238309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond 字符串与dag匹配的参数化算法:漏斗及其他

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.07870

Manuel Cáceres

引用次数: 3

Merging Sorted Lists of Similar Strings 合并相似字符串的排序列表

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-08-19 DOI: 10.48550/arXiv.2208.09351

E. Myers

引用次数: 0

L-systems for Measuring Repetitiveness 测量重复性的l系统

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-06-03 DOI: 10.48550/arXiv.2206.01688

G. Navarro, Cristian Urbina

引用次数: 0

Efficient Construction of the BWT for Repetitive Text Using String Compression 使用字符串压缩高效构建重复文本的BWT

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-04-12 DOI: 10.48550/arXiv.2204.05969

Diego Díaz-Domínguez, G. Navarro

{"title":"Efficient Construction of the BWT for Repetitive Text Using String Compression","authors":"Diego Díaz-Domínguez, G. Navarro","doi":"10.48550/arXiv.2204.05969","DOIUrl":"https://doi.org/10.48550/arXiv.2204.05969","url":null,"abstract":"We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form. Our compression format not only saves space but also speeds up the required computations. Our experiments show important space and computation time savings when the text is repetitive. In moderate-size collections of real human genome assemblies (14.2 GB - 75.05 GB), our memory peak is, on average, 1.7x smaller than the peak of the state-of-the-art BCR BWT construction algorithm (texttt{ropebwt2}), while running 5x faster. Our current implementation was also able to compute the BCR BWT of 400 real human genome assemblies (1.2 TB) in 41.21 hours using 118.83 GB of working memory (around 10% of the input size). Interestingly, the results we report in the 1.2 TB file are dominated by the difficulties of scanning huge files under memory constraints (specifically, I/O operations). This fact indicates we can perform much better with a more careful implementation of our method, thus scaling to even bigger sizes efficiently.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133981482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Reduction ratio of the IS-algorithm: worst and random cases is算法的约简率:最坏情况和随机情况

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-04-09 DOI: 10.48550/arXiv.2204.04422

Vincent Jug'e

引用次数: 0

A theoretical and experimental analysis of BWT variants for string collections 字符串集合中BWT变量的理论和实验分析

Annual Symposium on Combinatorial Pattern Matching Pub Date : 2022-02-26 DOI: 10.4230/LIPIcs.CPM.2022.25

David Cenzato, Zsuzsanna Lipt'ak

引用次数: 1