Gabriel Bathie, Panagiotis Charalampopoulos, Tatiana Starikovskaya
{"title":"Longest Common Extensions with Wildcards: Trade-off and Applications","authors":"Gabriel Bathie, Panagiotis Charalampopoulos, Tatiana Starikovskaya","doi":"arxiv-2408.03610","DOIUrl":"https://doi.org/arxiv-2408.03610","url":null,"abstract":"We study the Longest Common Extension (LCE) problem in a string containing\u0000wildcards. Wildcards (also called \"don't cares\" or \"holes\") are special\u0000characters that match any other character in the alphabet, similar to the\u0000character \"?\" in Unix commands or \".\" in regular expression engines. We consider the problem parametrized by $G$, the number of maximal contiguous\u0000groups of wildcards in the input string. Our main contribution is a simple data\u0000structure for this problem that can be built in $O(n (G/t) log n)$ time,\u0000occupies $O(nG/t)$ space, and answers queries in $O(t)$ time, for any $t in [1\u0000.. G]$. Up to the $O(log n)$ factor, this interpolates smoothly between the\u0000data structure of Crochemore et al. [JDA 2015], which has $O(nG)$ preprocessing\u0000time and space, and $O(1)$ query time, and a simple solution based on the\u0000``kangaroo jumping'' technique [Landau and Vishkin, STOC 1986], which has\u0000$O(n)$ preprocessing time and space, and $O(G)$ query time. By establishing a connection between this problem and Boolean matrix\u0000multiplication, we show that our solution is optimal up to subpolynomial\u0000factors when $G = Omega(n)$ under a widely believed hypothesis. In addition,\u0000we develop a new simple, deterministic and combinatorial algorithm for sparse\u0000Boolean matrix multiplication. Finally, we show that our data structure can be used to obtain efficient\u0000algorithms for approximate pattern matching and structural analysis of strings\u0000with wildcards.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faster and simpler online/sliding rightmost Lempel-Ziv factorizations","authors":"Wataru Sumiyoshi, Takuya Mieno, Shunsuke Inenaga","doi":"arxiv-2408.03008","DOIUrl":"https://doi.org/arxiv-2408.03008","url":null,"abstract":"We tackle the problems of computing the rightmost variant of the Lempel-Ziv\u0000factorizations in the online/sliding model. Previous best bounds for this\u0000problem are O(n log n) time with O(n) space, due to Amir et al. [IPL 2002] for\u0000the online model, and due to Larsson [CPM 2014] for the sliding model. In this\u0000paper, we present faster O(n log n/log log n)-time solutions to both of the\u0000online/sliding models. Our algorithms are built on a simple data structure\u0000named BP-linked trees, and on a slightly improved version of the range\u0000minimum/maximum query (RmQ/RMQ) data structure on a dynamic list of integers.\u0000We also present other applications of our algorithms.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Deterministic Minimum Cost Bipartite Matching with Delays on a Line","authors":"Tung-Wei Kuo","doi":"arxiv-2408.02526","DOIUrl":"https://doi.org/arxiv-2408.02526","url":null,"abstract":"We study the online minimum cost bipartite perfect matching with delays\u0000problem. In this problem, $m$ servers and $m$ requests arrive over time, and an\u0000online algorithm can delay the matching between servers and requests by paying\u0000the delay cost. The objective is to minimize the total distance and delay cost.\u0000When servers and requests lie in a known metric space, there is a randomized\u0000$O(log n)$-competitive algorithm, where $n$ is the size of the metric space.\u0000When the metric space is unknown a priori, Azar and Jacob-Fanani proposed a\u0000deterministic\u0000$Oleft(frac{1}{epsilon}m^{logleft(frac{3+epsilon}{2}right)}right)$-competitive\u0000algorithm for any fixed $epsilon > 0$. This competitive ratio is tight when $n\u0000= 1$ and becomes $O(m^{0.59})$ for sufficiently small $epsilon$. In this paper, we improve upon the result of Azar and Jacob-Fanani for the\u0000case where servers and requests are on the real line, providing a deterministic\u0000$tilde{O}(m^{0.5})$-competitive algorithm. Our algorithm is based on the\u0000Robust Matching (RM) algorithm proposed by Raghvendra for the minimum cost\u0000bipartite perfect matching problem. In this problem, delay is not allowed, and\u0000all servers arrive in the beginning. When a request arrives, the RM algorithm\u0000immediately matches the request to a free server based on the request's minimum\u0000$t$-net-cost augmenting path, where $t > 1$ is a constant. In our algorithm, we\u0000delay the matching of a request until its waiting time exceeds its minimum\u0000$t$-net-cost divided by $t$.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"2013 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Bounds for High-Dimensional Equivalence and Product Testing using Subcube Queries","authors":"Tomer Adar, Eldar Fischer, Amit Levi","doi":"arxiv-2408.02347","DOIUrl":"https://doi.org/arxiv-2408.02347","url":null,"abstract":"We study property testing in the subcube conditional model introduced by\u0000Bhattacharyya and Chakraborty (2017). We obtain the first equivalence test for\u0000$n$-dimensional distributions that is quasi-linear in $n$, improving the\u0000previously known $tilde{O}(n^2/varepsilon^2)$ query complexity bound to\u0000$tilde{O}(n/varepsilon^2)$. We extend this result to general finite alphabets\u0000with logarithmic cost in the alphabet size. By exploiting the specific structure of the queries that we use (which are\u0000more restrictive than general subcube queries), we obtain a cubic improvement\u0000over the best known test for distributions over ${1,ldots,N}$ under the\u0000interval querying model of Canonne, Ron and Servedio (2015), attaining a query\u0000complexity of $tilde{O}((log N)/varepsilon^2)$, which for fixed\u0000$varepsilon$ almost matches the known lower bound of $Omega((log N)/loglog\u0000N)$. We also derive a product test for $n$-dimensional distributions with\u0000$tilde{O}(n / varepsilon^2)$ queries, and provide an $Omega(sqrt{n} /\u0000varepsilon^2)$ lower bound for this property.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos
{"title":"First Order Stochastic Optimization with Oblivious Noise","authors":"Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos","doi":"arxiv-2408.02090","DOIUrl":"https://doi.org/arxiv-2408.02090","url":null,"abstract":"We initiate the study of stochastic optimization with oblivious noise,\u0000broadly generalizing the standard heavy-tailed noise setup. In our setting, in\u0000addition to random observation noise, the stochastic gradient may be subject to\u0000independent oblivious noise, which may not have bounded moments and is not\u0000necessarily centered. Specifically, we assume access to a noisy oracle for the\u0000stochastic gradient of $f$ at $x$, which returns a vector $nabla f(gamma, x)\u0000+ xi$, where $gamma$ is the bounded variance observation noise and $xi$ is\u0000the oblivious noise that is independent of $gamma$ and $x$. The only\u0000assumption we make on the oblivious noise $xi$ is that $mathbf{Pr}[xi = 0]\u0000ge alpha$ for some $alpha in (0, 1)$. In this setting, it is not\u0000information-theoretically possible to recover a single solution close to the\u0000target when the fraction of inliers $alpha$ is less than $1/2$. Our main\u0000result is an efficient list-decodable learner that recovers a small list of\u0000candidates, at least one of which is close to the true solution. On the other\u0000hand, if $alpha = 1-epsilon$, where $0< epsilon < 1/2$ is sufficiently small\u0000constant, the algorithm recovers a single solution. Along the way, we develop a\u0000rejection-sampling-based algorithm to perform noisy location estimation, which\u0000may be of independent interest.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anders Aamand, Justin Y. Chen, Mina Dalirrooyfard, Slobodan Mitrović, Yuriy Nevmyvaka, Sandeep Silwal, Yinzhan Xu
{"title":"Differentially Private Gomory-Hu Trees","authors":"Anders Aamand, Justin Y. Chen, Mina Dalirrooyfard, Slobodan Mitrović, Yuriy Nevmyvaka, Sandeep Silwal, Yinzhan Xu","doi":"arxiv-2408.01798","DOIUrl":"https://doi.org/arxiv-2408.01798","url":null,"abstract":"Given an undirected, weighted $n$-vertex graph $G = (V, E, w)$, a Gomory-Hu\u0000tree $T$ is a weighted tree on $V$ such that for any pair of distinct vertices\u0000$s, t in V$, the Min-$s$-$t$-Cut on $T$ is also a Min-$s$-$t$-Cut on $G$.\u0000Computing a Gomory-Hu tree is a well-studied problem in graph algorithms and\u0000has received considerable attention. In particular, a long line of work\u0000recently culminated in constructing a Gomory-Hu tree in almost linear time\u0000[Abboud, Li, Panigrahi and Saranurak, FOCS 2023]. We design a differentially private (DP) algorithm that computes an\u0000approximate Gomory-Hu tree. Our algorithm is $varepsilon$-DP, runs in\u0000polynomial time, and can be used to compute $s$-$t$ cuts that are\u0000$tilde{O}(n/varepsilon)$-additive approximations of the Min-$s$-$t$-Cuts in\u0000$G$ for all distinct $s, t in V$ with high probability. Our error bound is\u0000essentially optimal, as [Dalirrooyfard, Mitrovi'c and Nevmyvaka, NeurIPS 2023]\u0000showed that privately outputting a single Min-$s$-$t$-Cut requires $Omega(n)$\u0000additive error even with $(1, 0.1)$-DP and allowing for a multiplicative error\u0000term. Prior to our work, the best additive error bounds for approximate\u0000all-pairs Min-$s$-$t$-Cuts were $O(n^{3/2}/varepsilon)$ for $varepsilon$-DP\u0000[Gupta, Roth and Ullman, TCC 2012] and $O(sqrt{mn} cdot\u0000text{polylog}(n/delta) / varepsilon)$ for $(varepsilon, delta)$-DP [Liu,\u0000Upadhyay and Zou, SODA 2024], both of which are implied by differential private\u0000algorithms that preserve all cuts in the graph. An important technical\u0000ingredient of our main result is an $varepsilon$-DP algorithm for computing\u0000minimum Isolating Cuts with $tilde{O}(n / varepsilon)$ additive error, which\u0000may be of independent interest.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fully Dynamic $k$-Clustering with Fast Update Time and Small Recourse","authors":"Sayan Bhattacharya, Martín Costa, Naveen Garg, Silvio Lattanzi, Nikos Parotsidis","doi":"arxiv-2408.01325","DOIUrl":"https://doi.org/arxiv-2408.01325","url":null,"abstract":"In the dynamic metric $k$-median problem, we wish to maintain a set of $k$\u0000centers $S subseteq V$ in an input metric space $(V, d)$ that gets updated via\u0000point insertions/deletions, so as to minimize the objective $sum_{x in V}\u0000min_{y in S} d(x, y)$. The quality of a dynamic algorithm is measured in\u0000terms of its approximation ratio, \"recourse\" (the number of changes in $S$ per\u0000update) and \"update time\" (the time it takes to handle an update). The ultimate\u0000goal in this line of research is to obtain a dynamic $O(1)$ approximation\u0000algorithm with $tilde{O}(1)$ recourse and $tilde{O}(k)$ update time. Dynamic $k$-median is a canonical example of a class of problems known as\u0000dynamic $k$-clustering, that has received significant attention in recent\u0000years. To the best of our knowledge, however, previous papers either attempt to\u0000minimize the algorithm's recourse while ignoring its update time, or minimize\u0000the algorithm's update time while ignoring its recourse. For dynamic\u0000$k$-median, we come arbitrarily close to resolving the main open question on\u0000this topic, with the following results. (I) We develop a new framework of randomized local search that is suitable\u0000for adaptation in a dynamic setting. For every $epsilon > 0$, this gives us a\u0000dynamic $k$-median algorithm with $O(1/epsilon)$ approximation ratio,\u0000$tilde{O}(k^{epsilon})$ recourse and $tilde{O}(k^{1+epsilon})$ update time.\u0000This framework also generalizes to dynamic $k$-clustering with $ell^p$-norm\u0000objectives, giving similar bounds for the dynamic $k$-means and a new trade-off\u0000for dynamic $k$-center. (II) If it suffices to maintain only an estimate of the value of the optimal\u0000$k$-median objective, then we obtain a $O(1)$ approximation algorithm with\u0000$tilde{O}(k)$ update time. We achieve this result via adapting the Lagrangian\u0000Relaxation framework to the dynamic setting.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peaker Guo, Seeun William Umboh, Anthony Wirth, Justin Zobel
{"title":"Online Computation of String Net Frequency","authors":"Peaker Guo, Seeun William Umboh, Anthony Wirth, Justin Zobel","doi":"arxiv-2408.00308","DOIUrl":"https://doi.org/arxiv-2408.00308","url":null,"abstract":"The net frequency (NF) of a string, of length $m$, in a text, of length $n$,\u0000is the number of occurrences of the string in the text with unique left and\u0000right extensions. Recently, Guo et al. [CPM 2024] showed that NF is\u0000combinatorially interesting and how two key questions can be computed\u0000efficiently in the offline setting. First, SINGLE-NF: reporting the NF of a\u0000query string in an input text. Second, ALL-NF: reporting an occurrence and the\u0000NF of each string of positive NF in an input text. For many applications,\u0000however, facilitating these computations in an online manner is highly\u0000desirable. We are the first to solve the above two problems in the online\u0000setting, and we do so in optimal time, assuming, as is common, a constant-size\u0000alphabet: SINGLE-NF in $O(m)$ time and ALL-NF in $O(n)$ time. Our results are\u0000achieved by first designing new and simpler offline algorithms using suffix\u0000trees, proving additional properties of NF, and exploiting Ukkonen's online\u0000suffix tree construction algorithm and results on implicit node maintenance in\u0000an implicit suffix tree by Breslauer and Italiano.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Constrained and k Shortest Paths","authors":"Abderrahim Bendahi, Adrien Fradin","doi":"arxiv-2408.00899","DOIUrl":"https://doi.org/arxiv-2408.00899","url":null,"abstract":"Finding a shortest path in a graph is one of the most classic problems in\u0000algorithmic and graph theory. While we dispose of quite efficient algorithms\u0000for this ordinary problem (like the Dijkstra or Bellman-Ford algorithms), some\u0000slight variations in the problem statement can quickly lead to computationally\u0000hard problems. This article focuses specifically on two of these variants,\u0000namely the constrained shortest paths problem and the k shortest paths problem.\u0000Both problems are NP-hard, and thus it's not sure we can conceive a polynomial\u0000time algorithm (unless P = NP), ours aren't for instance. Moreover, across this\u0000article, we provide ILP formulations of these problems in order to give a\u0000different point of view to the interested reader. Although we did not try to\u0000implement these on modern ILP solvers, it can be an interesting path to\u0000explore. We also mention how these algorithms constitute essential ingredients in some\u0000of the most important modern applications in the field of data science, such as\u0000Isomap, whose main objective is the reduction of dimensionality of\u0000high-dimensional datasets.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sam Coy, Artur Czumaj, Gopinath Mishra, Anish Mukherjee
{"title":"Log Diameter Rounds MST Verification and Sensitivity in MPC","authors":"Sam Coy, Artur Czumaj, Gopinath Mishra, Anish Mukherjee","doi":"arxiv-2408.00398","DOIUrl":"https://doi.org/arxiv-2408.00398","url":null,"abstract":"We consider two natural variants of the problem of minimum spanning tree\u0000(MST) of a graph in the parallel setting: MST verification (verifying if a\u0000given tree is an MST) and the sensitivity analysis of an MST (finding the\u0000lowest cost replacement edge for each edge of the MST). These two problems have\u0000been studied extensively for sequential algorithms and for parallel algorithms\u0000in the PRAM model of computation. In this paper, we extend the study to the\u0000standard model of Massive Parallel Computation (MPC). It is known that for graphs of diameter $D$, the connectivity problem can be\u0000solved in $O(log D + loglog n)$ rounds on an MPC with low local memory (each\u0000machine can store only $O(n^{delta})$ words for an arbitrary constant $delta\u0000> 0$) and with linear global memory, that is, with optimal utilization.\u0000However, for the related task of finding an MST, we need $Omega(log\u0000D_{text{MST}})$ rounds, where $D_{text{MST}}$ denotes the diameter of the\u0000minimum spanning tree. The state of the art upper bound for MST is $O(log n)$\u0000rounds; the result follows by simulating existing PRAM algorithms. While this\u0000bound may be optimal for general graphs, the benchmark of connectivity and\u0000lower bound for MST suggest the target bound of $O(log D_{text{MST}})$\u0000rounds, or possibly $O(log D_{text{MST}} + loglog n)$ rounds. As for now,\u0000we do not know if this bound is achievable for the MST problem on an MPC with\u0000low local memory and linear global memory. In this paper, we show that two\u0000natural variants of the MST problem: MST verification and sensitivity analysis\u0000of an MST, can be completed in $O(log D_T)$ rounds on an MPC with low local\u0000memory and with linear global memory; here $D_T$ is the diameter of the input\u0000``candidate MST'' $T$. The algorithms asymptotically match our lower bound,\u0000conditioned on the 1-vs-2-cycle conjecture.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"217 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}