{"title":"Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient","authors":"Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun","doi":"10.1145/3490148.3538574","DOIUrl":"https://doi.org/10.1145/3490148.3538574","url":null,"abstract":"Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Coester, Roie Levin, J. Naor, Ohad Talmon
{"title":"Competitive Algorithms for Block-Aware Caching","authors":"Christian Coester, Roie Levin, J. Naor, Ohad Talmon","doi":"10.1145/3490148.3538567","DOIUrl":"https://doi.org/10.1145/3490148.3538567","url":null,"abstract":"Motivated by the design of real system storage hierarchies, we study the block-aware caching problem, a generalization of classic caching in which fetching (or evicting) pages from the same block incurs the same cost as fetching (or evicting) just one page from the block. Given a cache of size k, and a sequence of requests from n pages partitioned into given blocks of size β ≤ k, the goal is to minimize the total cost of fetching to (or evicting from) cache. This problem captures generalized caching as a special case, which is already NP-hard offline. We show the following suite of results: For the eviction cost model, we show an O(log k)-approximate offline algorithm, a k-competitive deterministic online algorithm, and an O(log2 k)-competitive randomized online algorithm. For the fetching cost model, we show an integrality gap of Ω(β) for the natural LP relaxation of the problem, and an Ω(β +log k) lower bound for randomized online algorithms. The strategy of ignoring the block-structure and running a classical paging algorithm trivially achieves an O(β) approximation and an O(β log k) competitive ratio respectively for the offline and online-randomized setting. For both fetching and eviction models, we show improved bounds for the (h, k)-bicriteria version of the problem. In particular, when k = 2h, we match the performance of classical caching algorithms up to constant factors. Our results establish a strong separation between the tractability of the fetching and eviction cost models, which is interesting since fetching/eviction costs are the same up to an additive term for the classic caching problem. Previous work of Beckmann et al. (SPAA 21) only studied online deterministic algorithms for the fetching cost model when k > h. Our insight is to relax the block-aware caching problem to a submodular covering linear program. The main technical challenge is to maintain a competitive fractional solution to this LP, and to round it with bounded loss, as the constraints of this LP are revealed online. We hope that this framework is useful going forward for other problems that can be captured as submodular cover.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jannik Castenow, Björn Feldkord, Till Knollmann, Manuel Malatyali, F. Heide
{"title":"The k-Server with Preferences Problem","authors":"Jannik Castenow, Björn Feldkord, Till Knollmann, Manuel Malatyali, F. Heide","doi":"10.1145/3490148.3538595","DOIUrl":"https://doi.org/10.1145/3490148.3538595","url":null,"abstract":"The famous k-Server Problem covers plenty of resource allocation scenarios, and several variations have been studied extensively for decades. However, to the best of our knowledge, no research has considered the problem if the servers are not identical and requests can express which specific servers should serve them. Therefore, we present a new model generalizing the k-Server Problem by preferences of the requests and proceed to study it in a uniform metric space for deterministic online algorithms (the special case of paging).","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"50 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121008680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Energy Complexity of Las Vegas Leader Election","authors":"Yi-Jun Chang, Shunhua Jiang","doi":"10.1145/3490148.3538586","DOIUrl":"https://doi.org/10.1145/3490148.3538586","url":null,"abstract":"We consider the time (number of communication rounds) and energy (number of non-idle communication rounds per device) complexities of randomized leader election in a multiple-access channel, where the number of devices n ≥ 2 is unknown. It is well-known that for polynomial-time randomized leader election algorithms with success probability 1 - 1/poly(n), the optimal energy complexity is Θ(log log* n) if receivers can detect collisions, and it is Θ(log* n) otherwise.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130208636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering","authors":"Tom Tseng, Laxman Dhulipala, Julian Shun","doi":"10.1145/3490148.3538584","DOIUrl":"https://doi.org/10.1145/3490148.3538584","url":null,"abstract":"Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log6 n) expected amortized work and O(log4 n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Ω(n1-o(1)) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Ω(n1/2-o(1)) for incremental and decremental algorithms, and the bounds still hold when allowing no(1) -approximation.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"774 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115755663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcel Bezdrighin, Michael Elkin, M. Ghaffari, C. Grunau, Bernhard Haeupler, S. Ilchi, Václav Rozhoň
{"title":"Deterministic Distributed Sparse and Ultra-Sparse Spanners and Connectivity Certificates","authors":"Marcel Bezdrighin, Michael Elkin, M. Ghaffari, C. Grunau, Bernhard Haeupler, S. Ilchi, Václav Rozhoň","doi":"10.1145/3490148.3538565","DOIUrl":"https://doi.org/10.1145/3490148.3538565","url":null,"abstract":"This paper presents efficient distributed algorithms for a number of fundamental problems in the area of graph sparsification:We provide the first deterministic distributed algorithm that computes an ultra-sparse spanner in polylog(n) rounds in weighted graphs. Concretely, our algorithm outputs a spanning subgraph with only n + o (n) edges in which the pairwise distances are stretched by a factor of at most O(logn · 2O(log* n) ).","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balanced Allocations in Batches: Simplified and Generalized","authors":"Dimitrios Los, Thomas Sauerwald","doi":"10.1145/3490148.3538593","DOIUrl":"https://doi.org/10.1145/3490148.3538593","url":null,"abstract":"We consider the allocation of m balls (jobs) into n bins (servers). In the Two-Choice process, for each of m sequentially arriving balls, two randomly chosen bins are sampled and the ball is placed in the least loaded bin. It is well-known that the maximum load is m/n + log2 logn + O(1) with high probability. Berenbrink, Czumaj, Englert, Friedetzky and Nagel [7] introduced a parallel version of this process, where m balls arrive in consecutive batches of size b = n each. Balls within the same batch are allocated in parallel, using the load information of the bins at the beginning of the batch. They proved that the gap of this process is O(logn) with high probability. In this work, we present a new analysis of this setting, which is based on exponential potential functions. This allows us to both simplify and generalize the analysis of [7] in different ways: (1) Our analysis covers a broad class of processes. This includes not only Two-Choice, but also processes with fewer bin samples like the (1 + β)-process, processes which can only receive one bit of information from each bin sample and graphical allocation, where bins correspond to vertices in a graph. (2) Balls may be of different weights, as long as their weights are independent samples from a distribution satisfying a technical condition on its moment generating function. (3) For any batch sizes b ≥ n, we prove a gap of is O (b/n·logn). For any b ∈ [n, n3], we improve this to is O (b/n + logn) and show that it is tight for a family of processes. This implies the unexpected result that for e.g. the (1 + β)-process with constant β ∈ (0, 1], the gap is Θ(logn) for all b ∈ [n, n logn]. We also conduct experiments which support our theoretical results, and even hint at a superiority of less powerful processes like (1+ β) for large batch sizes. Full version of the paper at: https://arxiv.org/abs/2203.13902.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124825433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chetan Gupta, J. Hirvonen, Janne H. Korhonen, Jan Studen'y, J. Suomela
{"title":"Sparse Matrix Multiplication in the Low-Bandwidth Model","authors":"Chetan Gupta, J. Hirvonen, Janne H. Korhonen, Jan Studen'y, J. Suomela","doi":"10.1145/3490148.3538575","DOIUrl":"https://doi.org/10.1145/3490148.3538575","url":null,"abstract":"We study matrix multiplication in the low-bandwidth model: There are n computers, and we need to compute the product of two n × n matrices. Initially computer i knows row i of each input matrix. In one communication round each computer can send and receive one O(logn)-bit message. Eventually computer i has to output row i of the product matrix. We seek to understand the complexity of this problem in the uniformly sparse case: each row and column of each input matrix has at most d non-zeros and in the product matrix we only need to know the values of at most d elements in each row or column. This is exactly the setting that we have, e.g., when we apply matrix multiplication for triangle detection in graphs of maximum degree d. We focus on the supported setting: the structure of the matrices is known in advance; only the numerical values of nonzero elements are unknown.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133673385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivier Beaumont, Lionel Eyraud-Dubois, Mathieu Vérité, J. Langou
{"title":"I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Mathieu Vérité, J. Langou","doi":"10.1145/3490148.3538587","DOIUrl":"https://doi.org/10.1145/3490148.3538587","url":null,"abstract":"In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-k update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we consider a machine model with a fast memory of size S and an unbounded slow memory. In this model, all computations must be performed on operands in fast memory, and the goal is to minimize the amount of communication between slow and fast memories. As the set of computations is fixed by the choice of the algorithm, only the ordering of the computations (the schedule) directly influences the volume of communications.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128547225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Permutation Predictions for Non-Clairvoyant Scheduling","authors":"Alexander Lindermayr, Nicole Megow","doi":"10.1145/3490148.3538579","DOIUrl":"https://doi.org/10.1145/3490148.3538579","url":null,"abstract":"In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements with the objective to minimize the total (weighted) completion time. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in online algorithm design. While previous works used predictions on processing requirements, we propose a new prediction model, which provides a relative order of jobs which could be seen as predicting algorithmic actions rather than parts of the unknown input. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees and that they are learnable in both, theory and practice. We generalize the algorithmic framework proposed in the seminal paper by Kumar et al. (NeurIPS'18) and present the first learning-augmented scheduling results for weighted jobs and unrelated machines. We demonstrate in empirical experiments the practicability and superior performance compared to the previously suggested single-machine algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}