Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第5页

Scalable Fine-Grained Parallel Cycle Enumeration Algorithms 可扩展的细粒度并行循环枚举算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-02-19 DOI: 10.1145/3490148.3538585

J. Blanuša, P. Ienne, K. Atasu

{"title":"Scalable Fine-Grained Parallel Cycle Enumeration Algorithms","authors":"J. Blanuša, P. Ienne, K. Atasu","doi":"10.1145/3490148.3538585","DOIUrl":"https://doi.org/10.1145/3490148.3538585","url":null,"abstract":"Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. To our knowledge, we are the first ones to parallelise these two algorithms in a fine-grained manner. We are also the first to demonstrate experimentally a linear performance scaling. Such a scaling is made possible by our decomposition of long sequential searches into fine-grained tasks, which are then dynamically scheduled across CPU cores, enabling an optimal load balancing. Furthermore, we show that coarse-grained parallel versions of the Johnson and the Read-Tarjan algorithms that exploit edge- or vertex-level parallelism are not scalable. On a cluster of four multi-core CPUs with 256 physical cores, our fine-grained parallel algorithms are, on average, an order of magnitude faster than their coarse-grained parallel counterparts. The performance gap between the fine-grained and the coarse-grained parallel algorithms widens as we use more CPU cores. When using all 256 CPU cores, our parallel algorithms enumerate temporal cycles, on average, 260x faster than the serial algorithm of Kumar and Calders.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115044268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Bamboo Trimming Revisited: Simple Algorithms Can Do Well Too 重新审视竹子修剪:简单的算法也可以做得很好

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-01-18 DOI: 10.1145/3490148.3538580

John Kuszmaul

{"title":"Bamboo Trimming Revisited: Simple Algorithms Can Do Well Too","authors":"John Kuszmaul","doi":"10.1145/3490148.3538580","DOIUrl":"https://doi.org/10.1145/3490148.3538580","url":null,"abstract":"The bamboo trimming problem considers n bamboo with growth rates h1, 2, . . . , satisfying Σihi = 1. During a given unit of time, each bamboo grows by hi , and then the bamboo-trimming algorithm gets to trim one of the bamboo back down to height zero. The goal is to minimize the height of the tallest bamboo, also known as the backlog. The bamboo trimming problem is closely related to many scheduling problems, and can be viewed as a variation of the widely-studied fixed-rate cup game, but with constant-factor resource augmentation. Past work has given sophisticated pinwheel algorithms that achieve the optimal backlog of 2 in the bamboo trimming problem. It remained an open question, however, whether there exists a simple algorithm with the same guarantee-recent work has devoted considerable theoretical and experimental effort to answering this question. Two algorithms, in particular, have appeared as natural candidates: the Reduce-Max algorithm (which always cuts the tallest bamboo) and the Reduce-Fastest(x) algorithm (which cuts the fastest-growing bamboo out of those that have at least some height x). It is conjectured that Reduce-Max and Reduce- Fastest(1) both achieve backlog 2. This paper improves the bounds for both Reduce-Fastest and Reduce-Max. Among other results, we show that the exact optimal backlog for Reduce-Fastest(x) is x + 1 for all x ≥ 2 (this proves a conjecture of D'Emidio, Di Stefano, and Navarra in the case of x = 2), and we show that Reduce-Fastest(1) does not achieve backlog 2 (this disproves a conjecture of D'Emidio, Di Stefano, and Navarra). Finally, we show that there is a different algorithm, which we call the Deadline-Driven Strategy, that is both very simple and achieves the optimal backlog of 2. This resolves the question as to whether there exists a simple worst-case optimal algorithm for the bamboo trimming problem.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122819966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

wCQ: A Fast Wait-Free Queue with Bounded Memory Usage 具有有限内存使用的快速无等待队列

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-01-06 DOI: 10.1145/3490148.3538572

R. Nikolaev, B. Ravindran

引用次数: 2

Robust and Optimal Contention Resolution without Collision Detection 无冲突检测的鲁棒和最优争用解决方案

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2021-11-12 DOI: 10.1145/3490148.3538592

Yonggang Jiang, Chaodong Zheng

{"title":"Robust and Optimal Contention Resolution without Collision Detection","authors":"Yonggang Jiang, Chaodong Zheng","doi":"10.1145/3490148.3538592","DOIUrl":"https://doi.org/10.1145/3490148.3538592","url":null,"abstract":"Contention resolution on a multiple-access communication channel is a classical problem in distributed and parallel computing. In this problem, a set of nodes arrive over time, each with a message it intends to send. Time proceeds in synchronous slots, and in each slot each node can broadcast its message or remain idle. If in a slot one node broadcasts alone, it succeeds; otherwise, if multiple nodes broadcast simultaneously, messages collide and none succeeds. Nodes can differentiate collision and silence (that is, no node broadcasts) only if a collision detection mechanism is available. Ideally, a contention resolution algorithm should satisfy at least three criteria: (a) low time complexity (i.e., high throughput), meaning it does not take too long for all nodes to succeed; (b) low energy complexity, meaning each node does not make too many broadcast attempts before it succeeds; and (c) strong robustness, meaning the algorithm can maintain good performance even if interference is present. Such interference is often modeled by jamming---a jammed slot always generates collision. Previous work has shown, with collision detection, there are \"perfect\" contention resolution algorithms satisfying all three criteria. On the other hand, without collision detection, it was not until 2020 that an algorithm was discovered which can achieve optimal time complexity and low energy cost, assuming there is no jamming. More recently, the trade-off between throughput and robustness was studied. However, an intriguing and important question remains unknown: without collision detection, are there \"perfect\" contention resolution algorithms? In other words, when collision detection is absent and jamming is present, can we achieve both low total time complexity and low per-node energy cost? In this paper, we answer the above question affirmatively. Specifically, a new randomized algorithm for robust contention resolution is developed, assuming collision detection is not available. Lower bound results demonstrate it achieves both optimal time complexity and optimal energy complexity. If all nodes start execution simultaneously---which is often referred to as the \"static case\" in literature---another algorithm is developed that runs even faster. The separation on time complexity suggests, for robust contention resolution without collision detection, \"batch\" instances (that is, nodes start simultaneously) are inherently easier than \"scattered\" ones (that is, nodes arrive over time).","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116973217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2