Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) Pub Date : 2022-11-01 DOI:10.1109/P3HPC56579.2022.00011

K. Ibrahim, Chao Yang, Pieter Maris

引用次数: 2

Abstract

The emergence of accelerator-based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, by up to 52× in the explored problems. Our study visits the performance portability aggregation techniques to guide the development and the selection of performance portable algorithmic variants.

查看原文本刊更多论文

稀疏块对角矩阵多向量乘法在gpu上的性能可移植性

基于加速器的计算机体系结构和编程模型的出现，给大规模科学仿真软件实现性能可移植性带来了挑战。在本文中，我们重点研究了稀疏块对角矩阵多向量(SpMM)计算内核，并讨论了可用于在基于NVIDIA和AMD的加速器上实现性能可移植性的技术，这些加速器使用CUDA, HIP, OpenACC, Kokkos。我们表明，性能可移植性可以在编程模型、GPU架构和问题设置之间显著不同，在所探索的问题中，差异可达52倍。我们的研究访问了性能可移植性聚合技术，以指导性能可移植性算法变体的开发和选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

自引率

0.00%

发文量