ACM Transactions on Parallel Computing最新文献

筛选
英文 中文
Introduction to the Special Issue for SPAA’21 SPAA'21 特刊简介
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-12-14 DOI: 10.1145/3630608
Y. Azar, Julian Shun
{"title":"Introduction to the Special Issue for SPAA’21","authors":"Y. Azar, Julian Shun","doi":"10.1145/3630608","DOIUrl":"https://doi.org/10.1145/3630608","url":null,"abstract":"","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"2001 20","pages":"1 - 1"},"PeriodicalIF":1.6,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139001830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Conflict-Resilient Lock-Free Linearizable Calendar Queue 具有冲突恢复能力的无锁线性日历队列
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-12-06 DOI: 10.1145/3635163
Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, F. Quaglia
{"title":"A Conflict-Resilient Lock-Free Linearizable Calendar Queue","authors":"Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, F. Quaglia","doi":"10.1145/3635163","DOIUrl":"https://doi.org/10.1145/3635163","url":null,"abstract":"In the last two decades, great attention has been devoted to the design of non-blocking and linearizable data structures, which enable exploiting the scaled-up degree of parallelism in off-the-shelf shared-memory multi-core machines. In this context, priority queues are highly challenging. Indeed, concurrent attempts to extract the highest-priority item are prone to create detrimental thread conflicts that lead to abort/retry of the operations. In this article, we present the first priority queue that jointly provides: i) lock-freedom and linearizability; ii) conflict resiliency against concurrent extractions; iii) adaptiveness to different contention profiles; and iv) amortized constant-time access for both insertions and extractions. Beyond presenting our solution, we also provide proof of its correctness based on an assertional approach. Also, we present an experimental study on a 64-CPU machine, showing that our proposal provides performance improvements over state-of-the-art non-blocking priority queues.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"89 8","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138595960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HPS Cholesky: Hierarchical Parallelized Supernodal Cholesky with Adaptive Parameters HPS choolesky:自适应参数的分层并行超节点choolesky
ACM Transactions on Parallel Computing Pub Date : 2023-10-26 DOI: 10.1145/3630051
Shengle Lin, Wangdong Yang, Yikun Hu, Qinyun Cai, Minlu Dai, Haotian Wang, Kenli Li
{"title":"HPS Cholesky: Hierarchical Parallelized Supernodal Cholesky with Adaptive Parameters","authors":"Shengle Lin, Wangdong Yang, Yikun Hu, Qinyun Cai, Minlu Dai, Haotian Wang, Kenli Li","doi":"10.1145/3630051","DOIUrl":"https://doi.org/10.1145/3630051","url":null,"abstract":"Sparse supernodal Cholesky on multi-NUMAs is challenging due to the supernode relaxation and load balancing. In this work, we propose a novel approach to improve the performance of sparse Cholesky by combining deep learning with a relaxation parameter and a hierarchical parallelization strategy with NUMA affinity. Specifically, our relaxed supernodal algorithm utilizes a well-trained GCN model to adaptively adjust relaxation parameters based on the sparse matrix’s structure, achieving a proper balance between task-level parallelism and dense computational granularity. Additionally, the hierarchical parallelization maps supernodal tasks to the local NUMA parallel queue and updates contribution blocks in pipeline mode. Furthermore, the stream scheduling with NUMA affinity can further enhance the efficiency of memory access during the numerical factorization. The experimental results show that HPS Cholesky can outperform state-of-the-art libraries, such as Eigen LL T , CHOLMOD, PaStiX and SuiteSparse on (79.78% ) , (79.60% ) , (82.09% ) and (74.47% ) of 1128 datasets. It achieves an average speedup of 1.41x over the current optimal relaxation algorithm. Moreover, (70.83% ) of matrices have surpassed MKL sparse Cholesky on Xeon Gold 6248.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134907691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Online Scheduling of Moldable Task Graphs under Common Speedup Models 常用加速模型下可塑任务图的改进在线调度
ACM Transactions on Parallel Computing Pub Date : 2023-10-26 DOI: 10.1145/3630052
Lucas Perotin, Hongyang Sun
{"title":"Improved Online Scheduling of Moldable Task Graphs under Common Speedup Models","authors":"Lucas Perotin, Hongyang Sun","doi":"10.1145/3630052","DOIUrl":"https://doi.org/10.1145/3630052","url":null,"abstract":"We consider the online scheduling problem of moldable task graphs on multiprocessor systems for minimizing the overall completion time (or makespan). Moldable job scheduling has been widely studied in the literature, in particular when tasks have dependencies (i.e., task graphs) or when tasks are released on-the-fly (i.e., online). However, few studies have focused on both (i.e., online scheduling of moldable task graphs). In this paper, we design a new online scheduling algorithm for this problem and derive constant competitive ratios under several common yet realistic speedup models (i.e., roofline, communication, Amdahl, and a general combination). These results improve the ones we have shown in the preliminary version of the paper. We also prove, for each speedup model, a lower bound on the competitiveness of any online list scheduling algorithm that allocates processors to a task based only on the task’s parameters and not on its position in the graph. This lower bound matches exactly the competitive ratio of our algorithm for the roofline, communication and Amdahl’s model, and is close to the ratio for the general model. Finally, we provide a lower bound on the competitive ratio of any deterministic online algorithm for the arbitrary speedup model, which is not constant but depends on the number of tasks in the longest path of the graph.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134908046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Checkpointing strategies to tolerate non-memoryless failures on HPC platforms 在HPC平台上容忍非无内存故障的检查点策略
ACM Transactions on Parallel Computing Pub Date : 2023-09-22 DOI: 10.1145/3624560
Anne Benoit, Lucas Perotin, Yves Robert, Frédéric Vivien
{"title":"Checkpointing strategies to tolerate non-memoryless failures on HPC platforms","authors":"Anne Benoit, Lucas Perotin, Yves Robert, Frédéric Vivien","doi":"10.1145/3624560","DOIUrl":"https://doi.org/10.1145/3624560","url":null,"abstract":"This paper studies checkpointing strategies for parallel applications subject to failures. The optimal strategy to minimize total execution time, or makespan, is well known when failure IATs obey an Exponential distribution, but it is unknown for non-memoryless failure distributions. We explain why the latter fact is misunderstood in recent literature. We propose a general strategy that maximizes the expected efficiency until the next failure, and we show that this strategy achieves an asymptotically optimal makespan, thereby establishing the first optimality result for arbitrary failure distributions. Through extensive simulations, we show that the new strategy is always at least as good as the Young/Daly strategy for various failure distributions. For distributions with a high infant mortality (such as LogNormal with shape parameter k = 2.51 or Weibull with shape parameter 0.5), the execution time is divided by a factor 1.9 on average, and up to a factor 4.2 for recently deployed platforms.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136015029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Graph Coloring Made Easy 分布式图形着色变得容易
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-08-17 DOI: 10.1145/3605896
Yannic Maus
{"title":"Distributed Graph Coloring Made Easy","authors":"Yannic Maus","doi":"10.1145/3605896","DOIUrl":"https://doi.org/10.1145/3605896","url":null,"abstract":"In this paper, we present a deterministic (mathsf {CONGEST} ) algorithm to compute an O(kΔ)-vertex coloring in O(Δ/k) + log *n rounds, where Δ is the maximum degree of the network graph and k ≥ 1 can be freely chosen. The algorithm is extremely simple: Each node locally computes a sequence of colors and then it tries colors from the sequence in batches of size k. Our algorithm subsumes many important results in the history of distributed graph coloring as special cases, including Linial’s color reduction [Linial, FOCS’87], the celebrated locally iterative algorithm from [Barenboim, Elkin, Goldenberg, PODC’18], and various algorithms to compute defective and arbdefective colorings. Our algorithm can smoothly scale between several of these previous results and also simplifies the state of the art (Δ + 1)-coloring algorithm. At the cost of losing some of the algorithm’s simplicity we also provide a O(kΔ)-coloring algorithm in (O(sqrt {Delta /k})+log ^{*} n ) rounds. We also provide improved deterministic algorithms for ruling sets, and, additionally, we provide a tight characterization for 1-round color reduction algorithms.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42561903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier Transforms 基于快速傅里叶变换的非周期线性模板计算快速算法
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-07-24 DOI: 10.1145/3606338
Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu
{"title":"A Fast Algorithm for Aperiodic Linear Stencil Computation using Fast Fourier Transforms","authors":"Zafar Ahmad, R. Chowdhury, Rathish Das, P. Ganapathi, Aaron Gregory, Yimin Zhu","doi":"10.1145/3606338","DOIUrl":"https://doi.org/10.1145/3606338","url":null,"abstract":"Stencil computations are widely used to simulate the change of state of physical systems across a multidimensional grid over multiple timesteps. The state-of-the-art techniques in this area fall into three groups: cache-aware tiled looping algorithms, cache-oblivious divide-and-conquer trapezoidal algorithms, and Krylov subspace methods. In this paper, we present two efficient parallel algorithms for performing linear stencil computations. Current direct solvers in this domain are computationally inefficient, and Krylov methods require manual labor and mathematical training. We solve these problems for linear stencils by using DFT preconditioning on a Krylov method to achieve a direct solver which is both fast and general. Indeed, while all currently available algorithms for solving general linear stencils perform Θ(NT) work, where N is the size of the spatial grid and T is the number of timesteps, our algorithms perform o(NT) work. To the best of our knowledge, we give the first algorithms that use fast Fourier transforms to compute final grid data by evolving the initial data for many timesteps at once. Our algorithms handle both periodic and aperiodic boundary conditions, and achieve polynomially better performance bounds (i.e., computational complexity and parallel runtime) than all other existing solutions. Initial experimental results show that implementations of our algorithms that evolve grids of roughly 107 cells for around 105 timesteps run orders of magnitude faster than state-of-the-art implementations for periodic stencil problems, and 1.3 × to 8.5 × faster for aperiodic stencil problems. Code Repository: https://github.com/TEAlab/FFTStencils","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43986447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Computational Complexity of Feasibility Analysis for Conditional DAG Tasks 条件DAG任务可行性分析的计算复杂度
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-07-05 DOI: 10.1145/3606342
Sanjoy Baruah, A. Marchetti-Spaccamela
{"title":"The Computational Complexity of Feasibility Analysis for Conditional DAG Tasks","authors":"Sanjoy Baruah, A. Marchetti-Spaccamela","doi":"10.1145/3606342","DOIUrl":"https://doi.org/10.1145/3606342","url":null,"abstract":"The Conditional DAG (CDAG) task model is used for modeling multiprocessor real-time systems containing conditional expressions for which outcomes are not known prior to their evaluation. Feasibility analysis for CDAG tasks upon multiprocessor platforms is shown to be complete for the complexity class pspace; assuming np ≠ pspace, this result rules out the use of Integer Linear Programming solvers for solving this problem efficiently. It is further shown that there can be no pseudo-polynomial time algorithm that solves this problem unless p = pspace.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"10 1","pages":"1 - 22"},"PeriodicalIF":1.6,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48543350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for Right-Sizing Heterogeneous Data Centers 正确确定异构数据中心规模的算法
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-05-10 DOI: 10.1145/3595286
S. Albers, Jens Quedenfeld
{"title":"Algorithms for Right-Sizing Heterogeneous Data Centers","authors":"S. Albers, Jens Quedenfeld","doi":"10.1145/3595286","DOIUrl":"https://doi.org/10.1145/3595286","url":null,"abstract":"Power consumption is a dominant and still growing cost factor in data centers. In time periods with low load, the energy consumption can be reduced by powering down unused servers. We resort to a model introduced by Lin, Wierman, Andrew and Thereska [23, 24] that considers data centers with identical machines, and generalize it to heterogeneous data centers with d different server types. The operating cost of a server depends on its load and is modeled by an increasing, convex function for each server type. In contrast to earlier work, we consider the discrete setting, where the number of active servers must be integral. Thereby, we seek truly feasible solutions. For homogeneous data centers (d = 1), both the offline and the online problem were solved optimally in [3, 4]. In this paper, we study heterogeneous data centers with general time-dependent operating cost functions. We develop an online algorithm based on a work function approach which achieves a competitive ratio of 2d + 1 + ϵ for any ϵ > 0. For time-independent operating cost functions, the competitive ratio can be reduced to 2d + 1. There is a lower bound of 2d shown in [5], so our algorithm is nearly optimal. For the offline version, we give a graph-based (1 + ϵ)-approximation algorithm. Additionally, our offline algorithm is able to handle time-variable data-center sizes.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44289659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Clairvoyant Scheduling with Predictions 具有预测的非偷窥调度
IF 1.6
ACM Transactions on Parallel Computing Pub Date : 2023-05-02 DOI: 10.1145/3593969
Sungjin Im, Ravi Kumar, Mahshid Montazer Qaem, Manish Purohit
{"title":"Non-Clairvoyant Scheduling with Predictions","authors":"Sungjin Im, Ravi Kumar, Mahshid Montazer Qaem, Manish Purohit","doi":"10.1145/3593969","DOIUrl":"https://doi.org/10.1145/3593969","url":null,"abstract":"In the single-machine non-clairvoyant scheduling problem, the goal is to minimize the total completion time of jobs whose processing times are unknown a priori. We revisit this well-studied problem and consider the question of how to effectively use (possibly erroneous) predictions of the processing times. We study this question from ground zero by first asking what constitutes a good prediction; we then propose a new measure to gauge prediction quality and design scheduling algorithms with strong guarantees under this measure. Our approach to derive a prediction error measure based on natural desiderata could find applications for other online problems.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44598514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信