Implicit and explicit optimizations for stencil computations

Workshop on Memory System Performance and Correctness Pub Date : 2006-10-22 DOI:10.1145/1178597.1178605

S. Kamil, K. Datta, Samuel Williams, L. Oliker, J. Shalf, K. Yelick

引用次数: 153

Abstract

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.

查看原文本刊更多论文

模板计算的隐式和显式优化

基于模板的核构成了许多关于块结构网格的科学应用的核心。不幸的是，由于处理器和主存储器速度之间的差异，这些代码只能达到峰值性能的一小部分。我们研究了在Itanium 2、Opteron和Power5的传统基于缓存的内存系统上的几种优化，以及Cell处理器的异构多核设计。优化的目标是跨模板扫描的缓存重用，包括隐式缓存无关方法和缓存感知算法，以匹配缓存结构。最后，我们考虑在具有显式管理的内存层次结构的机器上的模板计算，即Cell处理器。总体而言，结果表明，缓存感知方法比缓存无关方法要快得多，并且Cell上显式管理的内存效率更高:相对于Power5，它的内存带宽几乎增加了2倍，速度提高了3.7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Memory System Performance and Correctness

自引率

0.00%

发文量