Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第3页

Automatic HBM Management: Models and Algorithms 自动HBM管理:模型和算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538570

Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips

{"title":"Automatic HBM Management: Models and Algorithms","authors":"Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips","doi":"10.1145/3490148.3538570","DOIUrl":"https://doi.org/10.1145/3490148.3538570","url":null,"abstract":"Some past and future supercomputer nodes incorporate High- Bandwidth Memory (HBM). Compared to standard DRAM, HBM has similar latency, higher bandwidth and lower capacity. In this paper, we evaluate algorithms for managing High- Bandwidth Memory automatically. Previous work suggests that, in the worst case, performance is extremely sensitive to the policy for managing the channel to DRAM. Prior theory shows that a priority-based scheme (where there is a static strict priority-order among p threads for channel access) is O(1)-competitive, but FIFO is not, and in the worst case is Ω(p) competitive. Following this theoretical guidance would be a disruptive change for vendors, who currently use FIFO variants in their DRAMcontroller hardware. Our goal is to determine theoretically and empirically whether we can justify recommending investment in priority-based DRAM controller hardware. In order to experiment with DRAM channel protocols, we chose a theoretical model, validated it against real hardware, and implemented a basic simulator. We corroborated the previous theoretical results for the model, conducted a parameter sweep while running our simulator on address traces from memory bandwidth-bound codes (GNU sort and TACO sparse matrix-vector product), and designed better channel-access algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114934422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Massively Parallel Algorithms for b-Matching b匹配的大规模并行算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538589

M. Ghaffari, C. Grunau, Slobodan Mitrovic

引用次数: 2

Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems k核分解的并行批动态算法及相关图问题

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538569

Quanquan C. Liu, Jessica Shi, Shangdi Yu, Laxman Dhulipala, Julian Shun

引用次数: 2

PREP-UC: A Practical Replicated Persistent Universal Construction 一个实用的复制持久通用结构

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538568

Gaetano Coccimiglio, Trevor Brown, Srivatsan Ravi

{"title":"PREP-UC: A Practical Replicated Persistent Universal Construction","authors":"Gaetano Coccimiglio, Trevor Brown, Srivatsan Ravi","doi":"10.1145/3490148.3538568","DOIUrl":"https://doi.org/10.1145/3490148.3538568","url":null,"abstract":"The process of designing and implementing correct concurrent data structures is non-trivial and often error prone. The recent commercial availability of non-volatile memory has prompted many researchers to also consider designing concurrent data structures that persist shared state allowing the data structure to be recovered following a power failure. These so called persistent concurrent data structures further complicate the process of achieving correct and efficient implementations. Universal constructions (UCs) which produce a concurrent object given a sequential object, have been studied extensively in the space of volatile shared memory as a means of more easily implementing correct concurrent data structures. In contrast, there are only a handful of persistent universal constructions (PUCs) which beyond producing a concurrent object from a sequential object, guarantees that the object can be recovered following a crash. Existing PUCs satisfy the correctness condition of durable linearizability which requires that operations are persisted before they complete. Satisfying the weaker correctness condition of buffered durable linearizability allows for improved performance at the cost of failing to recover some completed operations following a crash. In this work we design and implement both a buffered durable linearizable and a durable linearizable PUC based on the node replication UC. We demonstrate that we can achieve significantly better performance satisfying buffered durable linearizability while also restricting the maximum number of operations that can be lost after a crash.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132529403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Parallel Paging with Optimal Makespan 具有最优最大时间跨度的在线并行分页

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538577

Kunal Agrawal, M. A. Bender, Rathish Das, William Kuszmaul, E. Peserico, Michele Scquizzato

引用次数: 3

Approximate Dynamic Balanced Graph Partitioning 近似动态平衡图划分

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI: 10.1145/3490148.3538563

Harald Räcke, Stefan Schmid, R. Zabrodin

{"title":"Approximate Dynamic Balanced Graph Partitioning","authors":"Harald Räcke, Stefan Schmid, R. Zabrodin","doi":"10.1145/3490148.3538563","DOIUrl":"https://doi.org/10.1145/3490148.3538563","url":null,"abstract":"Networked systems are increasingly flexible and reconfigurable. This enables demand-aware infrastructures whose resources can be adjusted according to the traffic pattern they currently serve. This paper revisits the dynamic balanced graph partitioning problem, a generalization of the classic balanced graph partitioning problem. We are given a set P of n = kℓ processes which communicate over time according to a given request sequence σ. The processes are assigned to ℓ servers (each of capacity k), and a scheduler can change this assignment dynamically to reduce communication costs, at cost α per node move. Avin et al. showed an Ω(k) lower bound on the competitive ratio of any deterministic online algorithm, even in a model with resource augmentation, and presented an O(k log k)-competitive online algorithm. We study the offline version of this problem where σ is known to the algorithm. Our main contribution is a polynomial-time algorithm which provides an O(log n)-approximation with resource augmentation. Our algorithm relies on an integer linear program formulation in a metric space with spreading constraints. We relax the formulation to a linear program and employ Bartal's clustering algorithm in a novel way to round it.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128567768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Balancing Flow Time and Energy Consumption 平衡流动时间和能量消耗

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-06-03 DOI: 10.1145/3490148.3538582

Sami Davies, S. Khuller, Shi-Han Zhang

{"title":"Balancing Flow Time and Energy Consumption","authors":"Sami Davies, S. Khuller, Shi-Han Zhang","doi":"10.1145/3490148.3538582","DOIUrl":"https://doi.org/10.1145/3490148.3538582","url":null,"abstract":"In this paper, we study the following batch scheduling model: find a schedule that minimizes total flow time for n uniform length jobs, with release times and deadlines, where the machine is only actively processing jobs in at most k synchronized batches of size at most B. Prior work on such batch scheduling models has considered only feasibility with no regard to the flow time of the schedule. However, algorithms that minimize the cost from the scheduler's perspective---such as ones that minimize the active time of the processor---can result in schedules where the total flow time is arbitrarily high [15]. Such schedules are not valuable from the perspective of the client. In response, our work provides dynamic programs which minimize flow time subject to active time constraints. Our main contribution focuses on jobs with agreeable deadlines; for such job instances, we introduce dynamic programs that achieve runtimes of O(B ․ k ․ n) for unit jobs and O(B ․ O(B ․ n5) for uniform length jobs. These results improve upon our modification of a different, classical dynamic programming approach by Baptiste. While the modified DP works when deadlines are non-agreeable, this solution is more expensive, with runtime O(B ․ k2 ․ n7) [7].","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fully Polynomial-Time Distributed Computation in Low-Treewidth Graphs 低树宽图的全多项式时间分布计算

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-30 DOI: 10.1145/3490148.3538590

Taisuke Izumi, Naoki Kitamura, Takamasa Naruse, Gregory Schwartzman

引用次数: 4

Adaptive Massively Parallel Algorithms for Cut Problems 切问题的自适应大规模并行算法

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-27 DOI: 10.1145/3490148.3538576

M. Hajiaghayi, Marina Knittel, J. Olkowski, Hamed Saleh

引用次数: 2

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds 简短公告:紧内存无关并行矩阵乘法通信下界

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-05-26 DOI: 10.1145/3490148.3538552

Hussam Al Daas, Grey Ballard, L. Grigori, Suraj Kumar, Kathryn Rouse

引用次数: 3