Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"Longest substring palindrome after edit","authors":"Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.12","url":null,"abstract":"It is known that the length of the longest substring palindromes (LSPals) of a given string T of length n can be computed in O(n) time by Manacher's algorithm [J. ACM '75]. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(log (min {sigma, log n })) time after single character substitution, insertion, or deletion, where sigma denotes the number of distinct characters appearing in T. We also propose an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log n) time, after an existing substring in T is replaced by a string of arbitrary length l.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133094136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"Faster Online Elastic Degenerate String Matching","authors":"Kotaro Aoyama, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.9","url":null,"abstract":"An Elastic-Degenerate String [Iliopoulus et al., LATA 2017] is a sequence of sets of strings, which was recently proposed as a way to model a set of similar sequences. We give an online algorithm for the Elastic-Degenerate String Matching (EDSM) problem that runs in O(nm sqrt{m log m} + N) time and O(m) working space, where n is the number of elastic degenerate segments of the text, N is the total length of all strings in the text, and m is the length of the pattern. This improves the previous algorithm by Grossi et al. [CPM 2017] that runs in O(nm^2 + N) time.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133613328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dualities in Tree Representations","authors":"R. Chikhi, A. Schönhuth","doi":"10.4230/LIPIcs.CPM.2018.18","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.18","url":null,"abstract":"A characterization of the tree $T^*$ such that $mathrm{BP}(T^*)=overleftrightarrow{mathrm{DFUDS}(T)}$, the reversal of $mathrm{DFUDS}(T)$ is given. An immediate consequence is a rigorous characterization of the tree $hat{T}$ such that $mathrm{BP}(hat{T})=mathrm{DFUDS}(T)$. In summary, $mathrm{BP}$ and $mathrm{DFUDS}$ are unified within an encompassing framework, which might have the potential to imply future simplifications with regard to queries in $mathrm{BP}$ and/or $mathrm{DFUDS}$. Immediate benefits displayed here are to identify so far unnoted commonalities in most recent work on the Range Minimum Query problem, and to provide improvements for the Minimum Length Interval Query problem.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123945087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Undetected Redundancy in the Burrows-Wheeler Transform","authors":"Uwe Baier","doi":"10.4230/LIPIcs.CPM.2018.3","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.3","url":null,"abstract":"The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a text known to be highly compressible but also useful for sequence analysis, what makes the BWT highly attractive for lossless data compression. In this paper, we present a new technique to reduce the size of a BWT using its combinatorial properties, while keeping it invertible. The technique can be applied to any BWT-based compressor, and, as experiments show, is able to reduce the encoding size by 8-16 % on average and up to 33-57 % in the best cases (depending on the BWT-compressor used), making BWT-based compressors competitive or even superior to today's best lossless compressors.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125963198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars","authors":"K. Bringmann, Philip Wellnitz","doi":"10.4230/LIPIcs.CPM.2017.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.12","url":null,"abstract":"Tree-adjoining grammars are a generalization of context-free grammars that are well suited to model human languages and are thus popular in computational linguistics. In the tree-adjoining grammar recognition problem, given a grammar $Gamma$ and a string $s$ of length $n$, the task is to decide whether $s$ can be obtained from $Gamma$. Rajasekaran and Yooseph's parser (JCSS'98) solves this problem in time $O(n^{2omega})$, where $omega < 2.373$ is the matrix multiplication exponent. The best algorithms avoiding fast matrix multiplication take time $O(n^6)$. \u0000The first evidence for hardness was given by Satta (J. Comp. Linguist.'94): For a more general parsing problem, any algorithm that avoids fast matrix multiplication and is significantly faster than $O(|Gamma| n^6)$ in the case of $|Gamma| = Theta(n^{12})$ would imply a breakthrough for Boolean matrix multiplication. \u0000Following an approach by Abboud et al. (FOCS'15) for context-free grammar recognition, in this paper we resolve many of the disadvantages of the previous lower bound. We show that, even on constant-size grammars, any improvement on Rajasekaran and Yooseph's parser would imply a breakthrough for the $k$-Clique problem. This establishes tree-adjoining grammar parsing as a practically relevant problem with the unusual running time of $n^{2omega}$, up to lower order factors.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133911803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Charalampopoulos, M. Crochemore, C. Iliopoulos, T. Kociumaka, S. Pissis, J. Radoszewski, W. Rytter, Tomasz Waleń
{"title":"Linear-Time Algorithm for Long LCF with k Mismatches","authors":"P. Charalampopoulos, M. Crochemore, C. Iliopoulos, T. Kociumaka, S. Pissis, J. Radoszewski, W. Rytter, Tomasz Waleń","doi":"10.4230/LIPIcs.CPM.2018.23","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.23","url":null,"abstract":"In the Longest Common Factor with $k$ Mismatches (LCF$_k$) problem, we are given two strings $X$ and $Y$ of total length $n$, and we are asked to find a pair of maximal-length factors, one of $X$ and the other of $Y$, such that their Hamming distance is at most $k$. Thankachan et al. show that this problem can be solved in $mathcal{O}(n log^k n)$ time and $mathcal{O}(n)$ space for constant $k$. We consider the LCF$_k$($ell$) problem in which we assume that the sought factors have length at least $ell$, and the LCF$_k$($ell$) problem for $ell=Omega(log^{2k+2} n)$, which we call the Long LCF$_k$ problem. We use difference covers to reduce the Long LCF$_k$ problem to a task involving $m=mathcal{O}(n/log^{k+1}n)$ synchronized factors. The latter can be solved in $mathcal{O}(m log^{k+1}m)$ time, which results in a linear-time algorithm for Long LCF$_k$. In general, our solution to LCF$_k$($ell$) for arbitrary $ell$ takes $mathcal{O}(n + n log^{k+1} n/sqrt{ell})$ time.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129672186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online LZ77 Parsing and Matching Statistics with RLBWTs","authors":"H. Bannai, T. Gagie, I. Tomohiro","doi":"10.4230/LIPIcs.CPM.2018.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.7","url":null,"abstract":"Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse $T^R$ of a text $T$, to compute offline the LZ77 parse of $T$ in $O (n log r)$ time and $O (r)$ space, where $n$ is the length of $T$ and $r$ is the number of runs in the BWT of $T^R$. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of $T$ while still using $O (n log r)$ time and $O (r)$ space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further --- albeit making it static again and increasing its space by a factor proportional to the size of the alphabet --- such that later, given another string $S$ and $O (log log n)$-time random access to $T$, we can compute the matching statistics of $S$ with respect to $T$ in $O (|S| log log n)$ time.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123879618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text Indexing and Searching in Sublinear Time","authors":"J. Munro, G. Navarro, Yakov Nekrich","doi":"10.4230/LIPIcs.CPM.2020.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2020.24","url":null,"abstract":"We introduce the first index that can be built in $o(n)$ time for a text of length $n$, and also queried in $o(m)$ time for a pattern of length $m$. On a constant-size alphabet, for example, our index uses $O(nlog^{1/2+varepsilon}n)$ bits, is built in $O(n/log^{1/2-varepsilon} n)$ deterministic time, and finds the $mathrm{occ}$ pattern occurrences in time $O(m/log n + sqrt{log n}loglog n + mathrm{occ})$, where $varepsilon>0$ is an arbitrarily small constant. As a comparison, the most recent classical text index uses $O(nlog n)$ bits, is built in $O(n)$ time, and searches in time $O(m/log n + loglog n + mathrm{occ})$. We build on a novel text sampling based on difference covers, which enjoys properties that allow us efficiently computing longest common prefixes in constant time. We extend our results to the secondary memory model as well, where we give the first construction in $o(Sort(n))$ time of a data structure with suffix array functionality, which can search for patterns in the almost optimal time, with an additive penalty of $O(sqrt{log_{M/B} n}loglog n)$, where $M$ is the size of main memory available and $B$ is the disk block size.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124411109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gapped Pattern Statistics","authors":"Philippe Duchon, C. Nicaud, Carine Pivoteau","doi":"10.4230/LIPIcs.CPM.2017.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.21","url":null,"abstract":"We give a probabilistic analysis of parameters related to $alpha$-gapped repeats and palindromes in random words, under both uniform and memoryless distributions (where letters have different probabilities, but are drawn independently). \u0000More precisely, we study the expected number of maximal $alpha$-gapped patterns, as well as the expected length of the longest $alpha$-gapped pattern in a random word.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114174904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Grossi, C. Iliopoulos, Chang Liu, N. Pisanti, S. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, Luca Versari
{"title":"On-Line Pattern Matching on Similar Texts","authors":"R. Grossi, C. Iliopoulos, Chang Liu, N. Pisanti, S. Pissis, Ahmad Retha, Giovanna Rosone, Fatima Vayani, Luca Versari","doi":"10.4230/LIPIcs.CPM.2017.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.9","url":null,"abstract":"Pattern matching on a set of similar texts has received much attention, especially recently, mainly due to its application in cataloguing human genetic variation. In particular, many different algorithms have been proposed for the off-line version of this problem; that is, constructing a compressed index for a set of similar texts in order to answer pattern matching queries efficiently. However, the on-line, more fundamental, version of this problem is a rather undeveloped topic. Solutions to the on-line version can be beneficial for a number of reasons; for instance, efficient on-line solutions can be used in combination with partial indexes as practical trade-offs. We make here an attempt to close this gap via proposing two efficient algorithms for this problem. Notably, one of the algorithms requires time linear in the size of the texts' representation, for short patterns. Furthermore, experimental results confirm our theoretical findings in practical terms.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133771398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}