Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs

K. Ibrahim, Chao Yang, Pieter Maris
{"title":"Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs","authors":"K. Ibrahim, Chao Yang, Pieter Maris","doi":"10.1109/P3HPC56579.2022.00011","DOIUrl":null,"url":null,"abstract":"The emergence of accelerator-based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, by up to 52× in the explored problems. Our study visits the performance portability aggregation techniques to guide the development and the selection of performance portable algorithmic variants.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The emergence of accelerator-based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, by up to 52× in the explored problems. Our study visits the performance portability aggregation techniques to guide the development and the selection of performance portable algorithmic variants.
稀疏块对角矩阵多向量乘法在gpu上的性能可移植性
基于加速器的计算机体系结构和编程模型的出现,给大规模科学仿真软件实现性能可移植性带来了挑战。在本文中,我们重点研究了稀疏块对角矩阵多向量(SpMM)计算内核,并讨论了可用于在基于NVIDIA和AMD的加速器上实现性能可移植性的技术,这些加速器使用CUDA, HIP, OpenACC, Kokkos。我们表明,性能可移植性可以在编程模型、GPU架构和问题设置之间显著不同,在所探索的问题中,差异可达52倍。我们的研究访问了性能可移植性聚合技术,以指导性能可移植性算法变体的开发和选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信