{"title":"The Longest Filled Common Subsequence Problem","authors":"M. Castelli, R. Dondi, G. Mauri, I. Zoppis","doi":"10.4230/LIPIcs.CPM.2017.14","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.14","url":null,"abstract":"Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multiset M of symbols missing in B, asks for a sequence B* obtained by inserting the symbols of M into B so that B* induces a common subsequence with A of maximum length. First, we investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol. Then, we give a 3/5-approximation algorithm for the problem. Finally, we present a fixed-parameter algorithm, when the problem is parameterized by the number of symbols inserted in B that \"match\" symbols of A.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125536428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Document Listing on Repetitive Collections with Guaranteed Performance","authors":"G. Navarro","doi":"10.4230/LIPIcs.CPM.2017.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.4","url":null,"abstract":"We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,a] is composed of D copies of a string of size n, and s single-character edits are applied on the copies. We introduce the first document listing index with size O~(n + s), precisely O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the ndoc strings where it appears in time O(m^2 + m lg N (lg D + lg^e N) ndoc), for any constant e > 0.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122817428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Representing the suffix tree with the CDAWG","authors":"D. Belazzougui, F. Cunial","doi":"10.4230/LIPIcs.CPM.2017.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.7","url":null,"abstract":"Given a string $T$, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with $e_T$ arcs, taking overall $O(e_T+e_{{overline{T}}})$ words of space, where ${overline{T}}$ is the reverse of $T$, and supporting some key operations in time between $O(1)$ and $O(log{log{n}})$ in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which $e_T$ grows sublinearly in the length of $T$ in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between $O(1)$ and $O(log{n})$ in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of $T$ itself, that takes $O(e_T)$ words of space, and that supports random access in $O(log{n})$ time. Furthermore, we establish a connection between the reversed CDAWG of $T$ and a context-free grammar that produces $T$ and only $T$, which might have independent interest.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130914613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"Faster STR-IC-LCS computation via RLE","authors":"Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2017.20","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.20","url":null,"abstract":"The constrained LCS problem asks one to find a longest common subsequence of two input strings $A$ and $B$ with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string $C$ as a substring. Given two strings $A$ and $B$ of respective lengths $M$ and $N$, and a constraint string $C$ of length at most $min{M, N}$, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz~({em Inf. Process. Lett.}, 11:423--426, 2012), runs in $O(MN)$ time. In this work, we present an $O(mN + nM)$-time solution to the STR-IC-LCS problem, where $m$ and $n$ denote the sizes of the run-length encodings of $A$ and $B$, respectively. Since $m leq M$ and $n leq N$ always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Cunha, S. Dantas, T. Gagie, Roland Wittler, L. Kowada, J. Stoye
{"title":"Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings","authors":"L. Cunha, S. Dantas, T. Gagie, Roland Wittler, L. Kowada, J. Stoye","doi":"10.4230/LIPIcs.CPM.2017.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.19","url":null,"abstract":"Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117265628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back","authors":"A. Policriti, N. Prezza","doi":"10.4230/LIPIcs.CPM.2017.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.17","url":null,"abstract":"The Lempel-Ziv factorization (LZ77) and the Run-Length encoded Burrows-Wheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes $z$ and $r$ closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. Let $n$ be the text length. We show that $RLBWT$ can be converted to $LZ77$ in $mathcal{O}(nlog r)$ time and $mathcal{O}(r)$ words of working space. Conversely, we provide an algorithm to convert $LZ77$ to $RLBWT$ in $mathcal{O}big(n(log r + log z)big)$ time and $mathcal{O}(r+z)$ words of working space. Note that $r$ and $z$ can be emph{constant} if the text is highly repetitive, and our algorithms can operate with (up to) emph{exponentially} less space than naive solutions based on full decompression.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114185226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bartłomiej Dudek, Paweł Gawrychowski, Piotr Ostropolski-Nalewaja
{"title":"A family of approximation algorithms for the maximum duo-preservation string mapping problem","authors":"Bartłomiej Dudek, Paweł Gawrychowski, Piotr Ostropolski-Nalewaja","doi":"10.4230/LIPIcs.CPM.2017.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.10","url":null,"abstract":"In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter so as to maximise the number of duos. A duo is a pair of consecutive letters that is mapped to a pair of consecutive letters in the same order. This is complementary to the well-studied Minimum Common String Partition problem, where the goal is to partition the former string into blocks that can be permuted and concatenated to obtain the latter string. \u0000Maximum Duo-Preservation String Mapping is APX-hard. After a series of improvements, Brubach [WABI 2016] showed a polynomial-time $3.25$-approximation algorithm. Our main contribution is that for any $epsilon>0$ there exists a polynomial-time $(2+epsilon)$-approximation algorithm. Similarly to a previous solution by Boria et al. [CPM 2016], our algorithm uses the local search technique. However, this is used only after a certain preliminary greedy procedure, which gives us more structure and makes a more general local search possible. We complement this with a specialised version of the algorithm that achieves $2.67$-approximation in quadratic time.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127735309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Longest Common Extensions with Recompression","authors":"T. I.","doi":"10.4230/LIPIcs.CPM.2017.18","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.18","url":null,"abstract":"Given two positions i and j in a string T of length N, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at i and j. A compressed LCE data structure stores T in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for compressed LCE data structures. We present a new compressed LCE data structure of size O(z lg (N/z)) that supports LCE queries in O(lg N) time, where z is the size of Lempel-Ziv 77 factorization without self-reference of T. Given T as an uncompressed form, we show how to build our data structure in O(N) time and space. Given T as a grammar compressed form, i.e., a straight-line program of size n generating T, we show how to build our data structure in O(n lg (N/n)) time and O(n + z lg (N/z)) space. Our algorithms are deterministic and always return correct answers.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128410837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing All Distinct Squares in Linear Time for Integer Alphabets","authors":"H. Bannai, Shunsuke Inenaga, D. Köppl","doi":"10.4230/LIPIcs.CPM.2017.22","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2017.22","url":null,"abstract":"Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation using compressed working space.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122614035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}