Autotuning Stencil-Based Computations on GPUs

A. Mametjanov, Daniel Lowell, Ching-Chen Ma, B. Norris
{"title":"Autotuning Stencil-Based Computations on GPUs","authors":"A. Mametjanov, Daniel Lowell, Ching-Chen Ma, B. Norris","doi":"10.1109/CLUSTER.2012.46","DOIUrl":null,"url":null,"abstract":"Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU auto tuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2012.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

Abstract

Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU auto tuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.
gpu上基于模板的自动调优计算
有限差分、基于模板的离散化方法广泛应用于求解描述物理现象的偏微分方程。牛顿-克雷洛夫迭代法通常用于基于模板的解决方案产生矩阵,显示对角稀疏模式。为了在现代gpu上利用这些结构,我们扩展了标准的对角稀疏矩阵表示,并在PETSc并行数值工具包中定义了新的矩阵和向量数据类型。在为这些操作确定了一些特定于gpu的优化和调优参数之后,我们创建了与这些类型相关的操作的可调CUDA实现。我们讨论了在Orio框架中GPU自动调优功能的实现,并给出了几个内核的性能结果,将它们与供应商调优的库实现进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信