{"title":"Fully-functional bidirectional Burrows-Wheeler indexes","authors":"F. Cunial, D. Belazzougui","doi":"10.4230/LIPIcs.CPM.2019.10","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.10","url":null,"abstract":"Given a string $T$ on an alphabet of size $sigma$, we describe a bidirectional Burrows-Wheeler index that takes $O(|T|log{sigma})$ bits of space, and that supports the addition emph{and removal} of one character, on the left or right side of any substring of $T$, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of $T$, but they could support removal only from specific substrings of $T$. We also describe an index that supports bidirectional addition and removal in $O(log{log{|T|}})$ time, and that occupies a number of words proportional to the number of left and right extensions of the maximal repeats of $T$. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs in small space, with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127996224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform","authors":"N. Prezza, Giovanna Rosone","doi":"10.4230/LIPIcs.CPM.2019.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.7","url":null,"abstract":"We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, {sigma}] can be computed from the Burrows-Wheeler transformed collection in O(n log {sigma}) time using o(n log {sigma}) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. The procedure at the core of our algorithms can be used to enumerate suffix tree intervals in succinct space from the BWT, which is of independent interest. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131193269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Rank and Select Queries on Dictionary-Compressed Text","authors":"N. Prezza","doi":"10.4230/LIPIcs.CPM.2019.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.4","url":null,"abstract":"We study the problem of supporting queries on a string $S$ of length $n$ within a space bounded by the size $gamma$ of a string attractor for $S$. Recent works showed that random access on $S$ can be supported in optimal $O(log(n/gamma)/loglog n)$ time within $Oleft (gamma rm{polylog} n right)$ space. In this paper, we extend this result to emph{rank} and emph{select} queries and provide lower bounds matching our upper bounds on alphabets of polylogarithmic size. Our solutions are given in the form of a space-time trade-off that is more general than the one previously known for grammars and that improves existing bounds on LZ77-compressed text by a $loglog n$ time-factor in emph{select} queries. We also provide matching lower and upper bounds for emph{partial sum} and emph{predecessor} queries within attractor-bounded space, and extend our lower bounds to encompass navigation of dictionary-compressed tree representations.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126369176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximating Approximate Pattern Matching","authors":"J. Studeny, P. Uznański","doi":"10.4230/LIPIcs.CPM.2019.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2019.15","url":null,"abstract":"Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1pmvarepsilon)$ multiplicative approximation variant of this problem, for $ell_p$ distance function. In this paper, we describe two $(1+varepsilon)$-approximate algorithms with a runtime of $widetilde{O}(frac{n}{varepsilon})$ for all (constant) non-negative values of $p$. For constant $p ge 1$ we show a deterministic $(1+varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $ell_1$ distance, by Gawrychowski and Uznanski [ICALP 2018] and only with a randomized algorithm. For constant $0 le p le 1$ we show a randomized algorithm for the $ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznanski for $ell_1$ distance.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126321322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette
{"title":"Finding a Small Number of Colourful Components","authors":"L. Bulteau, Konrad K. Dabrowski, G. Fertin, Matthew Johnson, D. Paulusma, Stéphane Vialette","doi":"10.4230/LIPICS.CPM.2019.20","DOIUrl":"https://doi.org/10.4230/LIPICS.CPM.2019.20","url":null,"abstract":"A partition (V_1,...,V_k) of the vertex set of a graph G with a (not necessarily proper) colouring c is colourful if no two vertices in any V_i have the same colour and every set V_i induces a connected graph. The Colourful Partition problem, introduced by Adamaszek and Popa, is to decide whether a coloured graph (G,c) has a colourful partition of size at most k. This problem is related to the Colourful Components problem, introduced by He, Liu and Zhao, which is to decide whether a graph can be modified into a graph whose connected components form a colourful partition by deleting at most p edges. \u0000Despite the similarities in their definitions, we show that Colourful Partition and Colourful Components may have different complexities for restricted instances. We tighten known NP-hardness results for both problems by closing a number of complexity gaps. In addition, we prove new hardness and tractability results for Colourful Partition. In particular, we prove that deciding whether a coloured graph (G,c) has a colourful partition of size 2 is NP-complete for coloured planar bipartite graphs of maximum degree 3 and path-width 3, but polynomial-time solvable for coloured graphs of treewidth 2. \u0000Rather than performing an ad hoc study, we use our classical complexity results to guide us in undertaking a thorough parameterized study of Colourful Partition. We show that this leads to suitable parameters for obtaining FPT results and moreover prove that Colourful Components and Colourful Partition may have different parameterized complexities, depending on the chosen parameter.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131110597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Superstrings with multiplicities","authors":"Bastien Cazaux, Eric Rivals","doi":"10.4230/LIPIcs.CPM.2018.21","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.21","url":null,"abstract":"A superstring of a set of words P = {s_1, ..., s_p } is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word s_i comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of s_i. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure","authors":"G. Fertin, J. Fradin, Christian Komusiewicz","doi":"10.4230/LIPIcs.CPM.2018.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.17","url":null,"abstract":"Let G = (V, A) be a vertex-colored arc-weighted directed acyclic graph (DAG) rooted in some vertex r. The color hierarchy graph H(G) of G is defined as follows: V (H(G)) is the color set C of G, and H(G) has an arc from c to c if G has an arc from a vertex of color c to a vertex of color c. We study the Maximum Colorful Arborescence (MCA) problem, which takes as input a DAG G such that H(G) is also a DAG, and aims at finding in G a maximum-weight arborescence rooted in r in which no color appears more than once. The MCA problem models the de novo inference of unknown metabolites by mass spectrometry experiments. Although the problem has been introduced ten years ago (under a different name), it was only recently pointed out that a crucial additional property in the problem definition was missing: by essence, H(G) must be a DAG. In this paper, we further investigate MCA under this new light and provide new algorithmic results for this problem, with a specific focus on fixed-parameter tractability (FPT) issues for different structural parameters of H(G). In particular, we show there exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O(3) algorithm from Böcker et al. [Proc. ECCB ’08]. We also prove that MCA is W[2]-hard relatively to the treewidth Ht of the underlying undirected graph of H(G), and further show that it is FPT relatively to Ht + lC , where lC := |V | − |C|. 2012 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems, G.2.1 Combinatorics, G.2.2 Graph Theory","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121008188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"Lyndon Factorization of Grammar Compressed Texts Revisited","authors":"Isamu Furuya, Yuto Nakashima, I. Tomohiro, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.24","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.24","url":null,"abstract":"We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS '17), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms '83) when w is highly compressible.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"47 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114003091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda
{"title":"Computing longest common square subsequences","authors":"Takafumi Inoue, Shunsuke Inenaga, Heikki Hyyrö, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.15","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.15","url":null,"abstract":"A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127553686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda
{"title":"Longest Lyndon Substring After Edit","authors":"Y. Urabe, Yuto Nakashima, Shunsuke Inenaga, H. Bannai, M. Takeda","doi":"10.4230/LIPIcs.CPM.2018.19","DOIUrl":"https://doi.org/10.4230/LIPIcs.CPM.2018.19","url":null,"abstract":"The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T') where T' is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T') in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T' for both problems.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125657054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}