Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第4页

Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient 许多顺序迭代算法可以并行且(接近)高效

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-25 DOI: 10.1145/3490148.3538574

Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun

{"title":"Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient","authors":"Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun","doi":"10.1145/3490148.3538574","DOIUrl":"https://doi.org/10.1145/3490148.3538574","url":null,"abstract":"Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Competitive Algorithms for Block-Aware Caching 块感知缓存的竞争算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-24 DOI: 10.1145/3490148.3538567

Christian Coester, Roie Levin, J. Naor, Ohad Talmon

{"title":"Competitive Algorithms for Block-Aware Caching","authors":"Christian Coester, Roie Levin, J. Naor, Ohad Talmon","doi":"10.1145/3490148.3538567","DOIUrl":"https://doi.org/10.1145/3490148.3538567","url":null,"abstract":"Motivated by the design of real system storage hierarchies, we study the block-aware caching problem, a generalization of classic caching in which fetching (or evicting) pages from the same block incurs the same cost as fetching (or evicting) just one page from the block. Given a cache of size k, and a sequence of requests from n pages partitioned into given blocks of size β ≤ k, the goal is to minimize the total cost of fetching to (or evicting from) cache. This problem captures generalized caching as a special case, which is already NP-hard offline. We show the following suite of results: For the eviction cost model, we show an O(log k)-approximate offline algorithm, a k-competitive deterministic online algorithm, and an O(log2 k)-competitive randomized online algorithm. For the fetching cost model, we show an integrality gap of Ω(β) for the natural LP relaxation of the problem, and an Ω(β +log k) lower bound for randomized online algorithms. The strategy of ignoring the block-structure and running a classical paging algorithm trivially achieves an O(β) approximation and an O(β log k) competitive ratio respectively for the offline and online-randomized setting. For both fetching and eviction models, we show improved bounds for the (h, k)-bicriteria version of the problem. In particular, when k = 2h, we match the performance of classical caching algorithms up to constant factors. Our results establish a strong separation between the tractability of the fetching and eviction cost models, which is interesting since fetching/eviction costs are the same up to an additive term for the classic caching problem. Previous work of Beckmann et al. (SPAA 21) only studied online deterministic algorithms for the fetching cost model when k > h. Our insight is to relax the block-aware caching problem to a submodular covering linear program. The main technical challenge is to maintain a competitive fractional solution to this LP, and to round it with bounded loss, as the constraints of this LP are revealed online. We hope that this framework is useful going forward for other problems that can be captured as submodular cover.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The k-Server with Preferences Problem 具有首选项的k-Server问题

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-23 DOI: 10.1145/3490148.3538595

Jannik Castenow, Björn Feldkord, Till Knollmann, Manuel Malatyali, F. Heide

引用次数: 2

The Energy Complexity of Las Vegas Leader Election 拉斯维加斯领导人选举的能源复杂性

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-17 DOI: 10.1145/3490148.3538586

Yi-Jun Chang, Shunhua Jiang

引用次数: 2

Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering 并行批处理-动态最小生成森林与动态聚类图聚类效率

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-10 DOI: 10.1145/3490148.3538584

Tom Tseng, Laxman Dhulipala, Julian Shun

{"title":"Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering","authors":"Tom Tseng, Laxman Dhulipala, Julian Shun","doi":"10.1145/3490148.3538584","DOIUrl":"https://doi.org/10.1145/3490148.3538584","url":null,"abstract":"Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log6 n) expected amortized work and O(log4 n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Ω(n1-o(1)) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Ω(n1/2-o(1)) for incremental and decremental algorithms, and the bounds still hold when allowing no(1) -approximation.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"774 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115755663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Deterministic Distributed Sparse and Ultra-Sparse Spanners and Connectivity Certificates 确定性分布式稀疏和超稀疏扳手和连通性证书

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-04-29 DOI: 10.1145/3490148.3538565

Marcel Bezdrighin, Michael Elkin, M. Ghaffari, C. Grunau, Bernhard Haeupler, S. Ilchi, Václav Rozhoň

引用次数: 3

Balanced Allocations in Batches: Simplified and Generalized 分批均衡分配:简化和一般化

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-03-25 DOI: 10.1145/3490148.3538593

Dimitrios Los, Thomas Sauerwald

{"title":"Balanced Allocations in Batches: Simplified and Generalized","authors":"Dimitrios Los, Thomas Sauerwald","doi":"10.1145/3490148.3538593","DOIUrl":"https://doi.org/10.1145/3490148.3538593","url":null,"abstract":"We consider the allocation of m balls (jobs) into n bins (servers). In the Two-Choice process, for each of m sequentially arriving balls, two randomly chosen bins are sampled and the ball is placed in the least loaded bin. It is well-known that the maximum load is m/n + log2 logn + O(1) with high probability. Berenbrink, Czumaj, Englert, Friedetzky and Nagel [7] introduced a parallel version of this process, where m balls arrive in consecutive batches of size b = n each. Balls within the same batch are allocated in parallel, using the load information of the bins at the beginning of the batch. They proved that the gap of this process is O(logn) with high probability. In this work, we present a new analysis of this setting, which is based on exponential potential functions. This allows us to both simplify and generalize the analysis of [7] in different ways: (1) Our analysis covers a broad class of processes. This includes not only Two-Choice, but also processes with fewer bin samples like the (1 + β)-process, processes which can only receive one bit of information from each bin sample and graphical allocation, where bins correspond to vertices in a graph. (2) Balls may be of different weights, as long as their weights are independent samples from a distribution satisfying a technical condition on its moment generating function. (3) For any batch sizes b ≥ n, we prove a gap of is O (b/n·logn). For any b ∈ [n, n3], we improve this to is O (b/n + logn) and show that it is tight for a family of processes. This implies the unexpected result that for e.g. the (1 + β)-process with constant β ∈ (0, 1], the gap is Θ(logn) for all b ∈ [n, n logn]. We also conduct experiments which support our theoretical results, and even hint at a superiority of less powerful processes like (1+ β) for large batch sizes. Full version of the paper at: https://arxiv.org/abs/2203.13902.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124825433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Sparse Matrix Multiplication in the Low-Bandwidth Model 低带宽模型中的稀疏矩阵乘法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-03-02 DOI: 10.1145/3490148.3538575

Chetan Gupta, J. Hirvonen, Janne H. Korhonen, Jan Studen'y, J. Suomela

引用次数: 0

I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels 对称线性代数核的I/ o最优算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-02-21 DOI: 10.1145/3490148.3538587

Olivier Beaumont, Lionel Eyraud-Dubois, Mathieu Vérité, J. Langou

引用次数: 5

Permutation Predictions for Non-Clairvoyant Scheduling 非千里眼调度的排列预测

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-02-21 DOI: 10.1145/3490148.3538579

Alexander Lindermayr, Nicole Megow

{"title":"Permutation Predictions for Non-Clairvoyant Scheduling","authors":"Alexander Lindermayr, Nicole Megow","doi":"10.1145/3490148.3538579","DOIUrl":"https://doi.org/10.1145/3490148.3538579","url":null,"abstract":"In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements with the objective to minimize the total (weighted) completion time. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in online algorithm design. While previous works used predictions on processing requirements, we propose a new prediction model, which provides a relative order of jobs which could be seen as predicting algorithmic actions rather than parts of the unknown input. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees and that they are learnable in both, theory and practice. We generalize the algorithmic framework proposed in the seminal paper by Kumar et al. (NeurIPS'18) and present the first learning-augmented scheduling results for weighted jobs and unrelated machines. We demonstrate in empirical experiments the practicability and superior performance compared to the previously suggested single-machine algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122237721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20