SIMD code generation for stencils on brick decompositions

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2018-02-10 DOI:10.1145/3178487.3178537

Tuowen Zhao, Mary W. Hall, P. Basu, Samuel Williams, H. Johansen

引用次数: 4

Abstract

We present a stencil library and associated compiler code generation framework designed to maximize performance on higher-order stencil computations through the use of two main technologies: a fine-grained brick data layout designed to exploit the inherent multidimensional spatial locality endemic to stencil computations, and a vector scatter associative reordering transformation that reduces vector loads and alignment operations and exposes opportunities for the backend compiler to reduce computation. For a range of stencil computations, we compare the generated code expressed in the brick library to the standard tiled code. We attain up to a 7.2X speedup on the most complex stencils when running on an Intel Knights Landing (Xeon Phi) processor.

查看原文本刊更多论文

用于砖分解的模板SIMD代码生成

我们提出了一个模板库和相关的编译器代码生成框架，旨在通过使用两种主要技术来最大化高阶模板计算的性能:一种细粒度的砖块数据布局，旨在利用模板计算特有的固有多维空间局部性;一种矢量分散关联重排序转换，减少了矢量负载和对齐操作，并为后端编译器提供了减少计算的机会。对于一系列模板计算，我们将砖库中表示的生成代码与标准平铺代码进行比较。当在Intel Knights Landing (Xeon Phi)处理器上运行时，我们在最复杂的模板上获得了高达7.2倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量