Parallel Computing最新文献

筛选
英文 中文
Reviewer acknowledgment 评论家承认
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-02-01 DOI: 10.1016/S0167-8191(23)00010-8
{"title":"Reviewer acknowledgment","authors":"","doi":"10.1016/S0167-8191(23)00010-8","DOIUrl":"https://doi.org/10.1016/S0167-8191(23)00010-8","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"115 ","pages":"Article 103004"},"PeriodicalIF":1.4,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49702283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous sparse matrix–vector multiplication via compressed sparse row format 异构稀疏矩阵-向量乘法压缩稀疏行格式
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-02-01 DOI: 10.1016/j.parco.2023.102997
Phillip Allen Lane, Joshua Dennis Booth
{"title":"Heterogeneous sparse matrix–vector multiplication via compressed sparse row format","authors":"Phillip Allen Lane,&nbsp;Joshua Dennis Booth","doi":"10.1016/j.parco.2023.102997","DOIUrl":"https://doi.org/10.1016/j.parco.2023.102997","url":null,"abstract":"<div><p>Sparse matrix–vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to ill performance, SpMV normally requires special care to store and tune for a given device. Moreover, HPC is facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs and GPUs. Therefore, an emerging goal has been to produce heterogeneous formats and methods that allow critical kernels, e.g., SpMV, to be executed on different devices with portable performance and minimal changes to format and method. This paper presents a heterogeneous format based on CSR, named CSR-<span><math><mi>k</mi></math></span>, that can be tuned quickly and outperforms the average performance of Intel MKL on Intel Xeon Platinum 838 and AMD Epyc 7742 CPUs while still outperforming NVIDIA’s cuSPARSE and Sandia National Laboratories’ KokkosKernels on NVIDIA A100 and V100 for regular sparse matrices, i.e., sparse matrices where the number of nonzeros per row has a variance <span><math><mo>≤</mo></math></span>10, such as those commonly generated from two and three-dimensional finite difference and element problems. In particular, CSR-<span><math><mi>k</mi></math></span> achieves this with reordering and by grouping rows into a hierarchical structure of super-rows and super–super-rows that are represented by just a few extra arrays of pointers. Due to its simplicity, a model can be tuned for a device, and this model can be used to select super-row and super–super-rows sizes in constant time.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"115 ","pages":"Article 102997"},"PeriodicalIF":1.4,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49705252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parallel reduction of bandwidth for symmetric matrices 有效的并行减少带宽对称矩阵
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-01-01 DOI: 10.2139/ssrn.4050432
Valeriy Manin, B. Lang
{"title":"Efficient parallel reduction of bandwidth for symmetric matrices","authors":"Valeriy Manin, B. Lang","doi":"10.2139/ssrn.4050432","DOIUrl":"https://doi.org/10.2139/ssrn.4050432","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"28 1","pages":"102998"},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73003253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parallel branch-and-bound approaches for exact graph edit distance problem 精确图编辑距离问题的高效并行分支定界方法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-12-01 DOI: 10.1016/j.parco.2022.102984
Adel Dabah , Ibrahim Chegrane , Saïd Yahiaoui , Ahcene Bendjoudi , Nadia Nouali-Taboudjemat
{"title":"Efficient parallel branch-and-bound approaches for exact graph edit distance problem","authors":"Adel Dabah ,&nbsp;Ibrahim Chegrane ,&nbsp;Saïd Yahiaoui ,&nbsp;Ahcene Bendjoudi ,&nbsp;Nadia Nouali-Taboudjemat","doi":"10.1016/j.parco.2022.102984","DOIUrl":"10.1016/j.parco.2022.102984","url":null,"abstract":"<div><p><span>Graph Edit Distance (GED) is a well-known measure used in the graph matching to measure the similarity/dissimilarity between two graphs by computing the minimum cost of edit operations needed to transform one graph into another. This process, Which appears to be simple, is known NP-hard and time consuming since the search space is increasing exponentially. One way to optimally solve this problem is by using Branch and Bound (B&amp;B) algorithms, Which reduce the computation time required to explore the whole search space by performing an implicit enumeration of the search space instead of an exhaustive one based on a pruning technique. nevertheless, They remain inefficient when dealing with large problem instances due to the impractical running time needed to explore the whole search space. To overcome this issue, We propose in this paper three parallel B&amp;B approaches based on shared memory to exploit the multi-core CPU processors: First, a work-stealing approach where several instances of the B&amp;B algorithm explore a single search tree concurrently achieving speedups up to 24</span><span><math><mo>×</mo></math></span> faster than the sequential version. Second, a tree-based approach where multiple parts of the search tree are explored simultaneously by independent B&amp;B instances achieving speedups up to 28<span><math><mo>×</mo></math></span>. Finally, Due to the irregular nature of the GED problem, two load-balancing strategies are proposed to ensure a fair workload between parallel processes achieving impressive speedups up to 300<span><math><mo>×</mo></math></span>. all experiments have been carried out on well-known datasets</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"114 ","pages":"Article 102984"},"PeriodicalIF":1.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72384574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
NekRS, a GPU-accelerated spectral element Navier–Stokes solver NekRS, gpu加速谱元Navier-Stokes解算器
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-12-01 DOI: 10.1016/j.parco.2022.102982
Paul Fischer , Stefan Kerkemeier , Misun Min , Yu-Hsiang Lan , Malachi Phillips , Thilina Rathnayake , Elia Merzari , Ananias Tomboulides , Ali Karakus , Noel Chalmers , Tim Warburton
{"title":"NekRS, a GPU-accelerated spectral element Navier–Stokes solver","authors":"Paul Fischer ,&nbsp;Stefan Kerkemeier ,&nbsp;Misun Min ,&nbsp;Yu-Hsiang Lan ,&nbsp;Malachi Phillips ,&nbsp;Thilina Rathnayake ,&nbsp;Elia Merzari ,&nbsp;Ananias Tomboulides ,&nbsp;Ali Karakus ,&nbsp;Noel Chalmers ,&nbsp;Tim Warburton","doi":"10.1016/j.parco.2022.102982","DOIUrl":"10.1016/j.parco.2022.102982","url":null,"abstract":"<div><p><span><span>The development of NekRS, a GPU-oriented thermal-fluids simulation code based on the spectral element method (SEM) is described. For performance portability, the code is based on the open concurrent compute abstraction and leverages scalable developments in the SEM code Nek5000 and in libParanumal, which is a library of high-performance kernels for high-order </span>discretizations and PDE-based miniapps. Critical performance sections of the Navier–Stokes </span>time advancement are addressed. Performance results on several platforms are presented, including scaling to 27,648 V100s on OLCF Summit, for calculations of up to 60B grid points (240B degrees-of-freedom).</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"114 ","pages":"Article 102982"},"PeriodicalIF":1.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81085812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
SGPM: A coroutine framework for transaction processing SGPM:用于事务处理的协程框架
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-12-01 DOI: 10.1016/j.parco.2022.102980
Xinyuan Wang, Hejiao Huang
{"title":"SGPM: A coroutine framework for transaction processing","authors":"Xinyuan Wang,&nbsp;Hejiao Huang","doi":"10.1016/j.parco.2022.102980","DOIUrl":"10.1016/j.parco.2022.102980","url":null,"abstract":"<div><p><span><span>Coroutine is able to increase program concurrency and processor core utilization. However, for adapting the coroutine-to-transaction model, the existing coroutine package has the following disadvantages: (1) Additional scheduler threads incur synchronization overhead when the load between scheduler threads and worker threads is unbalanced. (2) Coroutines are swapped out periodically to prevent </span>deadlocks, which will increase the conflict rate by adding suspended transactions. (3) Supporting only the swap-out function (yield, await, etc.) cannot flexibly control the transaction swap-in time. In this paper, we present SGPM, a coroutine framework for </span>transaction processing<span>. To adapt to the coroutine-to-transaction model, SGPM has the following properties: First, it eliminates scheduler threads and the periodic coroutine switch. Second, it provides a variety of coroutine scheduling strategies to make all types of concurrency control protocols run on SGPM reasonably. We implement eight well-known concurrency control on SGPM and, particularly, we use SGPM to optimize the performance of four wound-wait concurrency control among them, including 2PL, SS2PL, Calvin, and EWV. The experiment result demonstrates that after SGPM optimization 2PL and SS2PL outperform OCC and MVCC, and the throughput of Calvin and EWV is also improved by 1.2x and 1.3x respectively.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"114 ","pages":"Article 102980"},"PeriodicalIF":1.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77557910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA Tausch:一个halo交换库,用于使用MPI、OpenCL和CUDA的大型异构计算系统
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-12-01 DOI: 10.1016/j.parco.2022.102973
Lukas Spies , Amanda Bienz , David Moulton , Luke Olson , Andrew Reisner
{"title":"Tausch: A halo exchange library for large heterogeneous computing systems using MPI, OpenCL, and CUDA","authors":"Lukas Spies ,&nbsp;Amanda Bienz ,&nbsp;David Moulton ,&nbsp;Luke Olson ,&nbsp;Andrew Reisner","doi":"10.1016/j.parco.2022.102973","DOIUrl":"10.1016/j.parco.2022.102973","url":null,"abstract":"<div><p><span>Exchanging halo data is a common task in modern scientific computing<span><span> applications and efficient handling of this operation is critical for the performance of the overall simulation. Tausch is a novel header-only library that provides a simple API for efficiently handling these types of data movements. Tausch supports both simple CPU-only systems, but also more complex heterogeneous systems with both CPUs and </span>GPUs. It currently supports both </span></span>OpenCL<span> and CUDA for communicating with GPGPU devices, and allows for communication between GPGPUs and CPUs. The API allows for drop-in replacement in existing codes and can be used for the communication layer in new codes. This paper provides an overview of the approach taken in Tausch, and a performance analysis that demonstrates expected and achieved performance. We highlight the ease of use and performance with three applications: First Tausch is compared to the halo exchange framework from two Mantevo applications, HPCCG and miniFE, and then it is used to replace a legacy halo exchange library in the flexible multigrid solver framework Cedar.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"114 ","pages":"Article 102973"},"PeriodicalIF":1.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85992755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph optimization algorithm using symmetry and host bias for low-latency indirect network 基于对称和主机偏差的低延迟间接网络图优化算法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-12-01 DOI: 10.1016/j.parco.2022.102983
Masahiro Nakao , Masaki Tsukamoto , Yoshiko Hanada , Keiji Yamamoto
{"title":"Graph optimization algorithm using symmetry and host bias for low-latency indirect network","authors":"Masahiro Nakao ,&nbsp;Masaki Tsukamoto ,&nbsp;Yoshiko Hanada ,&nbsp;Keiji Yamamoto","doi":"10.1016/j.parco.2022.102983","DOIUrl":"https://doi.org/10.1016/j.parco.2022.102983","url":null,"abstract":"<div><p>It is known that an indirect network with a small host-to-host Average Shortest Path Length (h-ASPL) improves overall system performance in a parallel computer system. As a means to discuss such indirect networks in graph theory, the Order/Radix Problem (ORP) has been proposed. ORP involves finding a graph with a minimum h-ASPL that satisfies a given number of hosts and radix. A graph in ORP represents an indirect network and has two types of vertices: host and switch. We propose an optimization algorithm to generate graphs with a sufficiently small h-ASPL. The primary features of the proposed algorithm are the symmetry of the graph and the bias of the hosts adjacent to each switch. These features reduce the computational time to calculate the h-ASPL and improve the search performance of the algorithm. The performance of the proposed algorithm is evaluated using problems presented by Graph Golf, an international ORP competition. Our results show that the proposed algorithm can generate graphs with a smaller h-ASPL than the existing algorithm. To evaluate the performance of the graphs generated by the proposed algorithm, we use the parallel simulation framework SimGrid and the parallel benchmark collection NAS Parallel Benchmarks. Our results also show that the graphs generated by the proposed algorithm have higher performance than those generated by the existing algorithm.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"114 ","pages":"Article 102983"},"PeriodicalIF":1.4,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167819122000722/pdfft?md5=70b6cbe2b73c6952541b7170b6406471&pid=1-s2.0-S0167819122000722-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137225368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments 操作数据分析的实践:从设计到生产HPC环境部署的经验
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102950
Alessio Netti , Michael Ott , Carla Guillen , Daniele Tafani , Martin Schulz
{"title":"Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments","authors":"Alessio Netti ,&nbsp;Michael Ott ,&nbsp;Carla Guillen ,&nbsp;Daniele Tafani ,&nbsp;Martin Schulz","doi":"10.1016/j.parco.2022.102950","DOIUrl":"10.1016/j.parco.2022.102950","url":null,"abstract":"<div><p><span>As HPC systems continue to grow in scale and complexity, efficient and manageable operation is increasingly critical. For this reason, many centers are starting to explore the use of </span><span><em>Operational </em><em>Data Analytics</em></span> (ODA) techniques, which extract knowledge from the massive amounts of data produced by monitoring systems and use it for enacting control over system knobs, or for aiding administrators through visualization. As ODA is a multi-faceted problem, much research effort has gone into finding solutions to its separate aspects: however, comprehensive solutions to enable production use of ODA are still rare, while accounts of ODA experiences and the associated challenges are even harder to come across.</p><p>In this work we aim to bridge the gap between ODA research and production use by presenting our own experiences, associated with proactive control of warm-water inlet temperatures<span> and visualization of job data on two different HPC systems. We cover the entire development process, starting from a description of requirements and challenges, and down to design, deployment and evaluation. Moreover, we discuss a series of critical points related to the maintainability of ODA, and propose action items in an effort to drive the community forward. We rely on a series of open-source tools and techniques, which make for a generic ODA framework that is suitable for most use cases.</span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102950"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74644871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Accelerating communication for parallel programming models on GPU systems 加速GPU系统上并行编程模型的通信
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2022-10-01 DOI: 10.1016/j.parco.2022.102969
Jaemin Choi , Zane Fink , Sam White , Nitin Bhat , David F. Richards , Laxmikant V. Kale
{"title":"Accelerating communication for parallel programming models on GPU systems","authors":"Jaemin Choi ,&nbsp;Zane Fink ,&nbsp;Sam White ,&nbsp;Nitin Bhat ,&nbsp;David F. Richards ,&nbsp;Laxmikant V. Kale","doi":"10.1016/j.parco.2022.102969","DOIUrl":"10.1016/j.parco.2022.102969","url":null,"abstract":"<div><p><span>As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of </span>parallel programming models<span>, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks<span> adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.1x in Charm++, 11.7x in AMPI, and 17.4x in Charm4py. We also observe increases in bandwidth of up to 10.1x in Charm++, 10x in AMPI, and 10.5x in Charm4py. We show the potential impact of our designs on real-world applications by evaluating a proxy application for the Jacobi iterative method, improving the communication performance by up to 12.4x in Charm++, 12.8x in AMPI, and 19.7x in Charm4py.</span></span></p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"113 ","pages":"Article 102969"},"PeriodicalIF":1.4,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82219606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信