Giulia Bernardini, Esteban Gabory, Solon P. Pissis, Leen Stougie, Michelle Sweering, Wiktor Zuba
{"title":"Elastic-Degenerate String Matching with 1 Error or Mismatch","authors":"Giulia Bernardini, Esteban Gabory, Solon P. Pissis, Leen Stougie, Michelle Sweering, Wiktor Zuba","doi":"10.1007/s00224-024-10194-8","DOIUrl":"https://doi.org/10.1007/s00224-024-10194-8","url":null,"abstract":"<p>An elastic-degenerate (ED) string is a sequence of <i>n</i> finite sets of strings of total length <i>N</i>, introduced to represent a set of related DNA sequences, also known as a <i>pangenome</i>. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length <i>m</i> in an ED text. The EDSM problem has recently received some attention by the combinatorial pattern matching community, culminating in an <span>(mathcal {tilde{O}}(nm^{omega -1})+mathcal {O}(N))</span>-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where <span>(omega )</span> denotes the matrix multiplication exponent and the <span>(mathcal {tilde{O}}(cdot ))</span> notation suppresses polylog factors. In the <i>k</i>-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most <i>k</i> errors. <i>k</i>-EDSM can be solved in <span>(mathcal {O}(k^2mG+kN))</span> time, under edit distance, or <span>(mathcal {O}(kmG+kN))</span> time, under Hamming distance, where <i>G</i> denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, <i>G</i> is only bounded by <i>N</i>, and so even for <span>(k=1)</span>, the existing algorithms run in <span>(varOmega (mN))</span> time in the worst case. In this paper we make progress in this direction. We show that 1-EDSM can be solved in <span>(mathcal {O}((nm^2 + N)log m))</span> or <span>(mathcal {O}(nm^3 + N))</span> time under edit distance. For the decision version of the problem, we present a faster <span>(mathcal {O}(nm^2sqrt{log m} + Nlog log m))</span>-time algorithm. We also show that 1-EDSM can be solved in <span>(mathcal {O}(nm^2 + Nlog m))</span> time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 1-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the <i>k</i>-errata trees for indexing with errors [Cole et al., STOC 2004]. This is an extended version of a paper presented at LATIN 2022.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142259368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
France Gheeraert, Giuseppe Romana, Manon Stipulanti
{"title":"String Attractors of Some Simple-Parry Automatic Sequences","authors":"France Gheeraert, Giuseppe Romana, Manon Stipulanti","doi":"10.1007/s00224-024-10195-7","DOIUrl":"https://doi.org/10.1007/s00224-024-10195-7","url":null,"abstract":"<p>Firstly studied by Kempa and Prezza in 2018 as the unifying idea behind text compression algorithms, string attractors have become a compelling object of theoretical research within the community of combinatorics on words. In this context, they have been studied for several families of finite and infinite words. In this paper, we focus on string attractors of prefixes of particular automatic infinite words (including the famous period-doubling and <i>k</i>-bonacci words) related to simple-Parry numbers. For a subfamily of these words, we describe string attractors of optimal size, while for the rest of them, we provide nearly optimal-size ones. Such a contribution is of particular interest, since in general finding smallest string attractors is NP-hard. This extends our previous work published in the international conference WORDS 2023.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jumping Automata over Infinite Words","authors":"Shaull Almagor, Omer Yizhaq","doi":"10.1007/s00224-024-10192-w","DOIUrl":"https://doi.org/10.1007/s00224-024-10192-w","url":null,"abstract":"<p>Jumping automata are finite automata that read their input in a non-consecutive manner, disregarding the order of the letters in the word. We introduce and study jumping automata over infinite words. Unlike the setting of finite words, which has been well studied, for infinite words it is not clear how words can be reordered. To this end, we consider three semantics: automata that read the infinite word in some order so that no letter is overlooked, automata that can permute the word in windows of a given size k, and automata that can permute the word in windows of an existentially-quantified bound. We study expressiveness, closure properties and algorithmic properties of these models.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Solution Sets of Three-Variable Word Equations","authors":"Aleksi Saarela","doi":"10.1007/s00224-024-10193-9","DOIUrl":"https://doi.org/10.1007/s00224-024-10193-9","url":null,"abstract":"<p>It is known that the set of solutions of any constant-free three-variable word equation can be represented using parametric words, and the number of numerical parameters and the level of nesting in these parametric words is at most logarithmic with respect to the length of the equation. We show that this result can be significantly improved in the case of unbalanced equations, that is, equations where at least one variable has a different number of occurrences on the left-hand side and on the right-hand side. More specifically, it is sufficient to have two numerical parameters and one level of nesting in this case. We also discuss the possibility of proving a similar result for balanced equations in the future.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near-Optimal Auctions on Independence Systems","authors":"Sabrina C. L. Ammann, Sebastian Stiller","doi":"10.1007/s00224-024-10189-5","DOIUrl":"https://doi.org/10.1007/s00224-024-10189-5","url":null,"abstract":"<p>A classical result by Myerson (Math. Oper. Res. <b>6</b>(1), 58-73, 1981) gives a characterization of an optimal auction for any given distribution of valuations of the bidders. We consider the situation where the distribution is not explicitly given but can be observed in a sample of auction results from the same distribution. A seminal paper by Morgenstern and Roughgarden (Adv.Neural Inf. Process. Syst. <b>28</b>, 2015) proposes to learn a near-optimal auction from the hypothesis class of <i>t</i>-level auctions. They prove a bound on the sample complexity, i.e., the function <span>(f(varepsilon , delta ))</span> of required samples to guarantee a certain level of precision <span>((1-varepsilon ))</span> with a probability of at least <span>((1-delta ))</span>, for the general single-parameter case and a tighter bound for the very restricted matroid case. We show a new bound for the case of independence systems, that widely generalizes matroids and contains several important combinatorial optimization problems. This bound of <span>(tilde{O}left( nicefrac {H^2n^4}{varepsilon ^3}right) )</span> falls neatly between those known for the general and the matroid case. The class of independence systems contains several well known NP-hard problems such as knapsack. Therefore, the allocation itself might in practice be limited to <span>(alpha )</span>-approximate solutions. In a second result we show that an approximation algorithm can be used without compromising the sample complexity. Also, the precision is affected only mildly, resulting in a factor of <span>(alpha cdot (1-varepsilon ))</span>.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lubomíra Dvořáková, Zuzana Masáková, Edita Pelantová
{"title":"2-Balanced Sequences Coding Rectangle Exchange Transformation","authors":"Lubomíra Dvořáková, Zuzana Masáková, Edita Pelantová","doi":"10.1007/s00224-024-10188-6","DOIUrl":"https://doi.org/10.1007/s00224-024-10188-6","url":null,"abstract":"<p>We define a new class of ternary sequences that are 2-balanced. These sequences are obtained by colouring of Sturmian sequences. We show that the class contains sequences of any given letter frequencies. We provide an upper bound on factor and abelian complexity of these sequences. Using the interpretation by rectangle exchange transformation, we prove that for almost all triples of letter frequencies, the upper bound on factor and abelian complexity is reached. The bound on factor complexity is given using a number-theoretical function which we compute explicitly for a class of parameters.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Obstructions to Return Preservation for Episturmian Morphisms","authors":"Valérie Berthé, Herman Goulet-Ouellet","doi":"10.1007/s00224-024-10190-y","DOIUrl":"https://doi.org/10.1007/s00224-024-10190-y","url":null,"abstract":"<p>This paper studies obstructions to preservation of return sets by episturmian morphisms. We show, by way of an explicit construction, that infinitely many obstructions exist. This generalizes and improves an earlier result about Sturmian morphisms.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141887400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Constructive Upper Bounds for Repetition Thresholds","authors":"Arseny M. Shur","doi":"10.1007/s00224-024-10187-7","DOIUrl":"https://doi.org/10.1007/s00224-024-10187-7","url":null,"abstract":"<p>We study the power of entropy compression in proving avoidance results in combinatorics on words. Namely, we analyze variants of a simple algorithm that transforms an input word into a word avoiding repetitions of prescribed type. This transformation can be made reversible by adding the log of the run of the algorithm to the output. Counting distinct logs, it is possible to conclude that a given repetition is avoidable over all sufficiently large alphabets. We introduce two methods of counting logs. Applying them to ordinary, undirected, and conjugate repetitions, we prove, in all cases, the results of type “<span>((1+frac{1}{d}))</span>-powers are avoidable over <span>(d+O(1))</span> letters”. These results are closer to the optimum than is usually expected from purely information-theoretic considerations. In the final part, we present experimental results obtained by the mentioned transformation algorithm in the extreme case of <span>((d+1))</span>-ary words avoiding <span>((1+frac{1}{d})^+!)</span>-powers.</p>","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"International Colloquium on Automata, Languages and Programming (ICALP 2020)","authors":"A. Dawar","doi":"10.1007/s00224-024-10185-9","DOIUrl":"https://doi.org/10.1007/s00224-024-10185-9","url":null,"abstract":"","PeriodicalId":22832,"journal":{"name":"Theory of Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141639936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}