Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli
{"title":"Genome assembly, from practice to theory: safe, complete and <i>linear-time</i>","authors":"Massimo Cairo, Romeo Rizzi, Alexandru I. Tomescu, Elia C. Zirondelli","doi":"10.1145/3632176","DOIUrl":"https://doi.org/10.1145/3632176","url":null,"abstract":"Genome assembly asks to reconstruct an unknown string from many shorter substrings of it. Even though it is one of the key problems in Bioinformatics, it is generally lacking major theoretical advances. Its hardness stems both from practical issues (size and errors of real data), and from the fact that problem formulations inherently admit multiple solutions. Given these, at their core, most state-of-the-art assemblers are based on finding non-branching paths (unitigs) in an assembly graph. While such paths constitute only partial assemblies, they are likely to be correct. More precisely, if one defines a genome assembly solution as a closed arc-covering walk of the graph, then unitigs appear in all solutions, being thus safe partial solutions. Until recently, it was open what are all the safe walks of an assembly graph. Tomescu and Medvedev (RECOMB 2016) characterized all such safe walks (omnitigs), thus giving the first safe and complete genome assembly algorithm. Even though maximal omnitig finding was later improved to quadratic time by Cairo et al. (ACM Trans. Algorithms 2019), it remained open whether the crucial linear-time feature of finding unitigs can be attained with omnitigs. We answer this question affirmatively, by describing a surprising O(m)-time algorithm to identify all maximal omnitigs of a graph with n nodes and m arcs, notwithstanding the existence of families of graphs with Θ(mn) total maximal omnitig size. This is based on the discovery of a family of walks (macrotigs) with the property that all the non-trivial omnitigs are univocal extensions of subwalks of a macrotig. This has two consequences: (1) A linear-time output-sensitive algorithm enumerating all maximal omnitigs. (2) A compact O(m) representation of all maximal omnitigs, which allows, e.g., for O(m)-time computation of various statistics on them. Our results close a long-standing theoretical question inspired by practical genome assemblers, originating with the use of unitigs in 1995. We envision our results to be at the core of a reverse transfer from theory to practical and complete genome assembly programs, as has been the case for other key Bioinformatics problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Algorithm for The <i>k</i> -Dyck Edit Distance Problem","authors":"Dvir Fried, Shay Golan, Tomasz Kociumaka, Tsvi Kopelowitz, Ely Porat, Tatiana Starikovskaya","doi":"10.1145/3627539","DOIUrl":"https://doi.org/10.1145/3627539","url":null,"abstract":"A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses S is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform S into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses S and a positive integer k , and the goal is to compute the Dyck edit distance of S only if the distance is at most k , and otherwise report that the distance is larger than k . Backurs and Onak [PODS’16] showed that the threshold Dyck edit distance problem can be solved in O ( n + k 16 ) time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs O ( n + k 4.544184 ) time with high probability or O ( n + k 4.853059 ) deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast (min , +) matrix product, and a careful modification of ideas used in Valiant’s parsing algorithm.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster Editing parameterized above modification-disjoint <i>P</i> <sub>3</sub> -packings","authors":"Shaohua Li, Marcin Pilipczuk, Manuel Sorge","doi":"10.1145/3626526","DOIUrl":"https://doi.org/10.1145/3626526","url":null,"abstract":"Given a graph G = ( V , E ) and an integer k , the Cluster Editing problem asks whether we can transform G into a union of vertex-disjoint cliques by at most k modifications (edge deletions or insertions). In this paper, we study the following variant of Cluster Editing . We are given a graph G = ( V , E ), a packing (mathcal {H} ) of modification-disjoint induced P 3 s (no pair of P 3 s in (mathcal {H} ) share an edge or non-edge) and an integer ℓ. The task is to decide whether G can be transformed into a union of vertex-disjoint cliques by at most (ell +|mathcal {H}| ) modifications (edge deletions or insertions). We show that this problem is NP-hard even when ℓ = 0 (in which case the problem asks to turn G into a disjoint union of cliques by performing exactly one edge deletion or insertion per element of (mathcal {H} ) ) and when each vertex is in at most 23 P 3 s of the packing. This answers negatively a question of van Bevern, Froese, and Komusiewicz (CSR 2016, ToCS 2018), repeated by C. Komusiewicz at Shonan meeting no. 144 in March 2019. We then initiate the study to find the largest integer c such that the problem remains tractable when restricting to packings such that each vertex is in at most c packed P 3 s. Here packed P 3 s are those belonging to the packing (mathcal {H} ) . Van Bevern et al. showed that the case c = 1 is fixed-parameter tractable with respect to ℓ and we show that the case c = 2 is solvable in | V | 2ℓ + O (1) time.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136211408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag
{"title":"Scalable High-Quality Hypergraph Partitioning","authors":"Lars Gottesbüren, Tobias Heuer, Nikolai Maas, Peter Sanders, Sebastian Schlag","doi":"10.1145/3626527","DOIUrl":"https://doi.org/10.1145/3626527","url":null,"abstract":"Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as few parts as possible. This problem is well-studied in sequential and distributed settings, but not in shared-memory. We close this gap by devising efficient and scalable shared-memory algorithms for all components employed in the best sequential solvers without compromises with regards to solution quality. This work presents the scalable and high-quality hypergraph partitioning framework Mt-KaHyPar. Its most important components are parallel improvement algorithms based on the FM algorithm and maximum flows, as well as a parallel clustering algorithm for coarsening – which are used in a multilevel scheme with log ( n ) levels. As additional components, we parallelize the n -level partitioning scheme, devise a deterministic version of our algorithm, and present optimizations for plain graphs. We evaluate our solver on more than 800 graphs and hypergraphs, and compare it with 25 different algorithms from the literature. Our fastest configuration outperforms almost all existing hypergraph partitioners with regards to both solution quality and running time. Our highest-quality configuration achieves the same solution quality as the best sequential partitioner KaHyPar, while being an order of magnitude faster with ten threads. Thus, two of our configurations occupy all fronts of the Pareto curve for hypergraph partitioning. Furthermore, our solvers exhibit good speedups, e.g., 29.6x in the geometric mean on 64 cores (deterministic), 22.3x (log ( n )-level), and 25.9x ( n -level).","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near-Optimal Time-Energy Trade-Offs for Deterministic Leader Election","authors":"Yi-Jun Chang, Ran Duan, Shunhua Jiang","doi":"10.1145/3614429","DOIUrl":"https://doi.org/10.1145/3614429","url":null,"abstract":"We consider the energy complexity of the leader election problem in the single-hop radio network model, where each device v has a unique identifier ID ( v ) ∈{ 1, 2, ⋖ , N } . Energy is a scarce resource for small battery-powered devices. For such devices, most of the energy is often spent on communication, not on computation. To approximate the actual energy cost, the energy complexity of an algorithm is defined as the maximum over all devices of the number of time slots where the device transmits or listens. Much progress has been made in understanding the energy complexity of leader election in radio networks, but very little is known about the tradeoff between time and energy. Chang et al. [STOC 2017] showed that the optimal deterministic energy complexity of leader election is Θ (log log N ) if each device can simultaneously transmit and listen but still leaving the problem of determining the optimal time complexity under any given energy constraint. Time–energy tradeoff: For any k ≥ log log N , we show that a leader among at most n devices can be elected deterministically in O ( k ċ n 1+ε ) + O ( k ċ N 1/k ) time and O ( k ) energy if each device can simultaneously transmit and listen, where ε > 0 is any small constant. This improves upon the previous O ( N )-time O (log log N )-energy algorithm by Chang et al. [STOC 2017]. We provide lower bounds to show that the time–energy tradeoff of our algorithm is near-optimal. Dense instances: For the dense instances where the number of devices is n = Θ ( N ), we design a deterministic leader election algorithm using only O (1) energy. This improves upon the O (log* N )-energy algorithm by Jurdziński, Kutyłowski, and Zatopiański [PODC 2002] and the O (α ( N ))-energy algorithm by Chang et al. [STOC 2017]. More specifically, we show that the optimal deterministic energy complexity of leader election is (Theta (max lbrace 1, log tfrac{N}{n}rbrace)) if each device cannot simultaneously transmit and listen, and it is (&#x0398; (max lbrace 1, log log tfrac{N}{n}rbrace)) if each device can simultaneously transmit and listen.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134904251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum+1 (s,t)-cuts and Dual Edge Sensitivity Oracle","authors":"Surender Baswana, Koustav Bhanja, Abhyuday Pandey","doi":"10.1145/3623271","DOIUrl":"https://doi.org/10.1145/3623271","url":null,"abstract":"Let G be a directed multi-graph on n vertices and m edges with a designated source vertex s and a designated sink vertex t. We study the (s, t)-cuts of capacity minimum+1 and as an important application of them, we give a solution to the dual edge sensitivity for (s, t)-mincuts – reporting an (s, t)-mincut upon failure or insertion of any pair of edges. Picard and Queyranne [Mathematical Programming Studies, 13(1):8-16, 1980] showed that there exists a directed acyclic graph (DAG) that compactly stores all minimum (s, t)-cuts of G. This structure also acts as an oracle for the single edge sensitivity of minimum (s, t)-cut. For undirected multi-graphs, Dinitz and Nutov [STOC, pages 509-518, 1995] showed that there exists an ({mathcal {O}}(n) ) size 2-level cactus model that stores all global cuts of capacity minimum+1. However, for minimum+1 (s, t)-cuts, no such compact structure exists till date. We present the following structural and algorithmic results on minimum+1 (s, t)-cuts. (1) Structure: There is an ({mathcal {O}}(m) ) size 2-level DAG structure that stores all minimum+1 (s, t)-cuts of G such that each minimum+1 (s, t)-cut appears as 3-transversal cut – it intersects any path in this structure at most thrice. We also show that there is an ({mathcal {O}}(mn) ) size structure for storing and characterizing all minimum+1 (s, t)-cuts in terms of 1-transversal cuts. (2) Data structure: There exists an ({mathcal {O}}(n^2) ) size data structure that, given a pair of vertices {u, v} which are not separated by an (s, t)-mincut, can determine in ({mathcal {O}}(1) ) time if there exists a minimum+1 (s, t)-cut, say (A, B), such that s, u ∈ A and v, t ∈ B; the corresponding cut can be reported in ({mathcal {O}}(|B|) ) time.(3) Sensitivity oracle: There exists an ({mathcal {O}}(n^2) ) size data structure that solves the dual edge sensitivity problem for (s, t)-mincuts. It takes ({mathcal {O}}(1) ) time to report the capacity of a resulting (s, t)-mincut (A, B) and ({mathcal {O}}(|B|) ) time to report the cut. (4) Lower bounds: For the data structure problems addressed in (2) and (3) above, we also provide a matching conditional lower bound. We establish a close relationship among three seemingly unrelated problems – all-pairs directed reachability problem, the dual edge sensitivity problem for (s, t)-mincuts, and the problem of reporting the capacity of ({x, y}, {u, v})-mincut for any four vertices x, y, u, v in G. Assuming the Directed Reachability Hypothesis by Patrascu [SIAM J. Computing, pages 827–847, 2011] and Goldstein et al. [WADS, pages 421-436, 2017], this leads to (tilde{Omega }(n^2) ) lower bounds on the space for the latter two problems.","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42177549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Static and Streaming Data Structures for Fréchet Distance Queries","authors":"Arnold Filtser, Omrit Filtser","doi":"https://dl.acm.org/doi/10.1145/3610227","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3610227","url":null,"abstract":"<p>Given a curve <i>P</i> with points in (mathbb {R}^d ) in a streaming fashion, and parameters ε > 0 and <i>k</i>, we construct a distance oracle that uses (O(frac{1}{varepsilon })^{kd}log varepsilon ^{-1} ) space, and given a query curve <i>Q</i> with <i>k</i> points in (mathbb {R}^d ), returns in (tilde{O}(kd) ) time a 1 + ε approximation of the discrete Fréchet distance between <i>Q</i> and <i>P</i>. In addition, we construct simplifications in the streaming model, oracle for distance queries to a sub-curve (in the static setting), and introduce the zoom-in problem. Our algorithms work in any dimension <i>d</i>, and therefore we generalize some useful tools and algorithms for curves under the discrete Fréchet distance to work efficiently in high dimensions.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 13","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"String Indexing with Compressed Patterns","authors":"Philip Bille, Inge Li Gørtz, Teresa Anna Steiner","doi":"https://dl.acm.org/doi/10.1145/3607141","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3607141","url":null,"abstract":"<p>Given a string <i>S</i> of length <i>n</i>, the classic string indexing problem is to preprocess <i>S</i> into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 14","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen
{"title":"Fréchet Distance for Uncertain Curves","authors":"Kevin Buchin, Chenglin Fan, Maarten Löffler, Aleksandr Popov, Benjamin Raichel, Marcel Roeloffzen","doi":"https://dl.acm.org/doi/10.1145/3597640","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3597640","url":null,"abstract":"<p>In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of <i>uncertainty regions,</i> where each region is a disk, a line segment, or a set of points. A <i>realisation</i> of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves. </p><p>We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [5] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models.</p><p>On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 16","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching on the Line Admits no (o(sqrt {log n})) -Competitive Algorithm","authors":"Enoch Peserico, Michele Scquizzato","doi":"https://dl.acm.org/doi/10.1145/3594873","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3594873","url":null,"abstract":"<p>We present a simple proof that no randomized online matching algorithm for the line can be ((sqrt {log _2(n+1)}/15))-competitive against an oblivious adversary for any <i>n</i> = 2<sup><i></i>i</sup> - 1 : <i>i</i> ∈ ℕ. This is the first super-constant lower bound for the problem, and disproves as a corollary a recent conjecture on the topology-parametrized competitiveness achievable on generic spaces.</p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"7 15","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}