Autotuning Stencil-Based Computations on GPUs

2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI:10.1109/CLUSTER.2012.46

A. Mametjanov, Daniel Lowell, Ching-Chen Ma, B. Norris

引用次数: 48

Abstract

Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU auto tuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.

查看原文本刊更多论文

gpu上基于模板的自动调优计算

有限差分、基于模板的离散化方法广泛应用于求解描述物理现象的偏微分方程。牛顿-克雷洛夫迭代法通常用于基于模板的解决方案产生矩阵，显示对角稀疏模式。为了在现代gpu上利用这些结构，我们扩展了标准的对角稀疏矩阵表示，并在PETSc并行数值工具包中定义了新的矩阵和向量数据类型。在为这些操作确定了一些特定于gpu的优化和调优参数之后，我们创建了与这些类型相关的操作的可调CUDA实现。我们讨论了在Orio框架中GPU自动调优功能的实现，并给出了几个内核的性能结果，将它们与供应商调优的库实现进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量