Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures最新文献

筛选
英文 中文
Automatic HBM Management: Models and Algorithms 自动HBM管理:模型和算法
Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips
{"title":"Automatic HBM Management: Models and Algorithms","authors":"Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips","doi":"10.1145/3490148.3538570","DOIUrl":"https://doi.org/10.1145/3490148.3538570","url":null,"abstract":"Some past and future supercomputer nodes incorporate High- Bandwidth Memory (HBM). Compared to standard DRAM, HBM has similar latency, higher bandwidth and lower capacity. In this paper, we evaluate algorithms for managing High- Bandwidth Memory automatically. Previous work suggests that, in the worst case, performance is extremely sensitive to the policy for managing the channel to DRAM. Prior theory shows that a priority-based scheme (where there is a static strict priority-order among p threads for channel access) is O(1)-competitive, but FIFO is not, and in the worst case is Ω(p) competitive. Following this theoretical guidance would be a disruptive change for vendors, who currently use FIFO variants in their DRAMcontroller hardware. Our goal is to determine theoretically and empirically whether we can justify recommending investment in priority-based DRAM controller hardware. In order to experiment with DRAM channel protocols, we chose a theoretical model, validated it against real hardware, and implemented a basic simulator. We corroborated the previous theoretical results for the model, conducted a parameter sweep while running our simulator on address traces from memory bandwidth-bound codes (GNU sort and TACO sparse matrix-vector product), and designed better channel-access algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114934422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Massively Parallel Algorithms for b-Matching b匹配的大规模并行算法
M. Ghaffari, C. Grunau, Slobodan Mitrovic
{"title":"Massively Parallel Algorithms for b-Matching","authors":"M. Ghaffari, C. Grunau, Slobodan Mitrovic","doi":"10.1145/3490148.3538589","DOIUrl":"https://doi.org/10.1145/3490148.3538589","url":null,"abstract":"This paper presents an O(log log đ) round massively parallel algorithm for 1 + ε approximation of maximum weighted b-matchings, using near-linear memory per machine. Here đ denotes the average degree in the graph and ε is an arbitrarily small positive constant. Recall that b-matching is the natural and well-studied generalization of the matching problem where different vertices are allowed to have different numbers of incident edges in the matching.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129738266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems k核分解的并行批动态算法及相关图问题
Quanquan C. Liu, Jessica Shi, Shangdi Yu, Laxman Dhulipala, Julian Shun
{"title":"Parallel Batch-Dynamic Algorithms for k-Core Decomposition and Related Graph Problems","authors":"Quanquan C. Liu, Jessica Shi, Shangdi Yu, Laxman Dhulipala, Julian Shun","doi":"10.1145/3490148.3538569","DOIUrl":"https://doi.org/10.1145/3490148.3538569","url":null,"abstract":"Maintaining a k-core decomposition quickly in a dynamic graph has important applications in network analysis. The main challenge for designing efficient exact algorithms is that a single update to the graph can cause significant global changes. Our paper focuses on approximation algorithms with small approximation factors that are much more efficient than what exact algorithms can obtain.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PREP-UC: A Practical Replicated Persistent Universal Construction 一个实用的复制持久通用结构
Gaetano Coccimiglio, Trevor Brown, Srivatsan Ravi
{"title":"PREP-UC: A Practical Replicated Persistent Universal Construction","authors":"Gaetano Coccimiglio, Trevor Brown, Srivatsan Ravi","doi":"10.1145/3490148.3538568","DOIUrl":"https://doi.org/10.1145/3490148.3538568","url":null,"abstract":"The process of designing and implementing correct concurrent data structures is non-trivial and often error prone. The recent commercial availability of non-volatile memory has prompted many researchers to also consider designing concurrent data structures that persist shared state allowing the data structure to be recovered following a power failure. These so called persistent concurrent data structures further complicate the process of achieving correct and efficient implementations. Universal constructions (UCs) which produce a concurrent object given a sequential object, have been studied extensively in the space of volatile shared memory as a means of more easily implementing correct concurrent data structures. In contrast, there are only a handful of persistent universal constructions (PUCs) which beyond producing a concurrent object from a sequential object, guarantees that the object can be recovered following a crash. Existing PUCs satisfy the correctness condition of durable linearizability which requires that operations are persisted before they complete. Satisfying the weaker correctness condition of buffered durable linearizability allows for improved performance at the cost of failing to recover some completed operations following a crash. In this work we design and implement both a buffered durable linearizable and a durable linearizable PUC based on the node replication UC. We demonstrate that we can achieve significantly better performance satisfying buffered durable linearizability while also restricting the maximum number of operations that can be lost after a crash.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132529403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Parallel Paging with Optimal Makespan 具有最优最大时间跨度的在线并行分页
Kunal Agrawal, M. A. Bender, Rathish Das, William Kuszmaul, E. Peserico, Michele Scquizzato
{"title":"Online Parallel Paging with Optimal Makespan","authors":"Kunal Agrawal, M. A. Bender, Rathish Das, William Kuszmaul, E. Peserico, Michele Scquizzato","doi":"10.1145/3490148.3538577","DOIUrl":"https://doi.org/10.1145/3490148.3538577","url":null,"abstract":"The classical paging problem can be described as follows: given a cache that can hold up to k pages (or blocks) and a sequence of requests to pages, how should we manage the cache so as to maximize performance-or, in other words, complete the sequence as quickly as possible. Whereas this sequential paging problem has been well understood for decades, the parallel version, where the cache is shared among p processors each issuing its own sequence of page requests, has been much more resistant. In this problem we are given p request sequences R1, R2, . . . , Rp , each of which accesses a disjoint set of pages, and we ask the question: how should the paging algorithm manage the cache to optimize the completion time of all sequences (i.e., the makespan). As for the classical sequential problem, the goal is to design an online paging algorithm that achieves an optimal competitive ratio, using O(1) resource augmentation.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"27 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132530771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Approximate Dynamic Balanced Graph Partitioning 近似动态平衡图划分
Harald Räcke, Stefan Schmid, R. Zabrodin
{"title":"Approximate Dynamic Balanced Graph Partitioning","authors":"Harald Räcke, Stefan Schmid, R. Zabrodin","doi":"10.1145/3490148.3538563","DOIUrl":"https://doi.org/10.1145/3490148.3538563","url":null,"abstract":"Networked systems are increasingly flexible and reconfigurable. This enables demand-aware infrastructures whose resources can be adjusted according to the traffic pattern they currently serve. This paper revisits the dynamic balanced graph partitioning problem, a generalization of the classic balanced graph partitioning problem. We are given a set P of n = kℓ processes which communicate over time according to a given request sequence σ. The processes are assigned to ℓ servers (each of capacity k), and a scheduler can change this assignment dynamically to reduce communication costs, at cost α per node move. Avin et al. showed an Ω(k) lower bound on the competitive ratio of any deterministic online algorithm, even in a model with resource augmentation, and presented an O(k log k)-competitive online algorithm. We study the offline version of this problem where σ is known to the algorithm. Our main contribution is a polynomial-time algorithm which provides an O(log n)-approximation with resource augmentation. Our algorithm relies on an integer linear program formulation in a metric space with spreading constraints. We relax the formulation to a linear program and employ Bartal's clustering algorithm in a novel way to round it.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128567768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Balancing Flow Time and Energy Consumption 平衡流动时间和能量消耗
Sami Davies, S. Khuller, Shi-Han Zhang
{"title":"Balancing Flow Time and Energy Consumption","authors":"Sami Davies, S. Khuller, Shi-Han Zhang","doi":"10.1145/3490148.3538582","DOIUrl":"https://doi.org/10.1145/3490148.3538582","url":null,"abstract":"In this paper, we study the following batch scheduling model: find a schedule that minimizes total flow time for n uniform length jobs, with release times and deadlines, where the machine is only actively processing jobs in at most k synchronized batches of size at most B. Prior work on such batch scheduling models has considered only feasibility with no regard to the flow time of the schedule. However, algorithms that minimize the cost from the scheduler's perspective---such as ones that minimize the active time of the processor---can result in schedules where the total flow time is arbitrarily high [15]. Such schedules are not valuable from the perspective of the client. In response, our work provides dynamic programs which minimize flow time subject to active time constraints. Our main contribution focuses on jobs with agreeable deadlines; for such job instances, we introduce dynamic programs that achieve runtimes of O(B ․ k ․ n) for unit jobs and O(B ․ O(B ․ n5) for uniform length jobs. These results improve upon our modification of a different, classical dynamic programming approach by Baptiste. While the modified DP works when deadlines are non-agreeable, this solution is more expensive, with runtime O(B ․ k2 ․ n7) [7].","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fully Polynomial-Time Distributed Computation in Low-Treewidth Graphs 低树宽图的全多项式时间分布计算
Taisuke Izumi, Naoki Kitamura, Takamasa Naruse, Gregory Schwartzman
{"title":"Fully Polynomial-Time Distributed Computation in Low-Treewidth Graphs","authors":"Taisuke Izumi, Naoki Kitamura, Takamasa Naruse, Gregory Schwartzman","doi":"10.1145/3490148.3538590","DOIUrl":"https://doi.org/10.1145/3490148.3538590","url":null,"abstract":"We consider global problems, i.e. problems that take at least diameter time, even when the bandwidth is not restricted. We show that all problems considered admit efficient solutions in low-treewidth graphs.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131945096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Adaptive Massively Parallel Algorithms for Cut Problems 切问题的自适应大规模并行算法
M. Hajiaghayi, Marina Knittel, J. Olkowski, Hamed Saleh
{"title":"Adaptive Massively Parallel Algorithms for Cut Problems","authors":"M. Hajiaghayi, Marina Knittel, J. Olkowski, Hamed Saleh","doi":"10.1145/3490148.3538576","DOIUrl":"https://doi.org/10.1145/3490148.3538576","url":null,"abstract":"We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is an abstraction of famous practical frameworks such as MapReduce, Hadoop, Flume, and Spark, has been at the forefront of this research. While great strides have been taken to create highly efficient MPC algorithms for a range of problems, recent progress has been limited by the 1-vs-2 Cycle Conjecture [20], which postulates that the simple problem of distinguishing between one and two cycles requires Ω(log n) MPC rounds. In the AMPC model, each machine has adaptive read access to a distributed hash table even when communication is restricted (i.e., in the middle of a round). While remaining practical [4], this gives algorithms the power to bypass limitations like the 1-vs-2 Cycle Conjecture.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130650076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds 简短公告:紧内存无关并行矩阵乘法通信下界
Hussam Al Daas, Grey Ballard, L. Grigori, Suraj Kumar, Kathryn Rouse
{"title":"Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds","authors":"Hussam Al Daas, Grey Ballard, L. Grigori, Suraj Kumar, Kathryn Rouse","doi":"10.1145/3490148.3538552","DOIUrl":"https://doi.org/10.1145/3490148.3538552","url":null,"abstract":"Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible values. The main result of this work is establishing memory-independent communication lower bounds with tight constants for parallel matrix multiplication. Our constants improve on previous work in each of three cases that depend on the relative sizes of the matrix aspect ratios and the number of processors.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125301929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信