Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, Roberto Grossi
{"title":"On the Complexity of String Matching for Graphs","authors":"Massimo Equi, Veli Mäkinen, Alexandru I. Tomescu, Roberto Grossi","doi":"https://dl.acm.org/doi/10.1145/3588334","DOIUrl":null,"url":null,"abstract":"<p>Exact string matching in labeled graphs is the problem of searching paths of a graph <i>G=(V, E)</i> such that the concatenation of their node labels is equal to a given pattern string <i>P</i>[1.<i>m</i>]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks.</p><p>We prove a conditional lower bound stating that, for any constant ε > 0, an <i>O</i>(|<i>E</i>|<sup>1 - ε</sup> <i>m</i>) time, or an <i>O</i>(|<i>E</i>| <i>m</i><sup>1 - ε</sup>)time algorithm for exact string matching in graphs, with node labels and pattern drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (<sans-serif>SETH</sans-serif>) is false. This holds even if restricted to undirected graphs with maximum node degree 2—that is, to <i>zig-zag matching in bidirectional strings</i>, or to <i>deterministic</i> directed acyclic graphs whose nodes have maximum sum of indegree and outdegree 3. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS’16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear time solvable problems.</p><p>An interesting corollary is that exact and approximate matching are equally hard (i.e., quadratic time) in graphs under <sans-serif>SETH</sans-serif>. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively (approximate pattern matching having also a matching <sans-serif>SETH</sans-serif> lower bound (Backurs and Indyk, STOC’15)).</p><p></p>","PeriodicalId":50922,"journal":{"name":"ACM Transactions on Algorithms","volume":"8 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Algorithms","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3588334","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Exact string matching in labeled graphs is the problem of searching paths of a graph G=(V, E) such that the concatenation of their node labels is equal to a given pattern string P[1.m]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks.
We prove a conditional lower bound stating that, for any constant ε > 0, an O(|E|1 - εm) time, or an O(|E| m1 - ε)time algorithm for exact string matching in graphs, with node labels and pattern drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. This holds even if restricted to undirected graphs with maximum node degree 2—that is, to zig-zag matching in bidirectional strings, or to deterministic directed acyclic graphs whose nodes have maximum sum of indegree and outdegree 3. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS’16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear time solvable problems.
An interesting corollary is that exact and approximate matching are equally hard (i.e., quadratic time) in graphs under SETH. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively (approximate pattern matching having also a matching SETH lower bound (Backurs and Indyk, STOC’15)).
期刊介绍:
ACM Transactions on Algorithms welcomes submissions of original research of the highest quality dealing with algorithms that are inherently discrete and finite, and having mathematical content in a natural way, either in the objective or in the analysis. Most welcome are new algorithms and data structures, new and improved analyses, and complexity results. Specific areas of computation covered by the journal include
combinatorial searches and objects;
counting;
discrete optimization and approximation;
randomization and quantum computation;
parallel and distributed computation;
algorithms for
graphs,
geometry,
arithmetic,
number theory,
strings;
on-line analysis;
cryptography;
coding;
data compression;
learning algorithms;
methods of algorithmic analysis;
discrete algorithms for application areas such as
biology,
economics,
game theory,
communication,
computer systems and architecture,
hardware design,
scientific computing