Combining loop fusion with prefetching on shared-memory multiprocessors

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI:10.1109/ICPP.1997.622560

N. Manjikian

引用次数: 15

Abstract

The performance of programs consisting of parallel loops on shared-memory multiprocessors is limited by long memory latencies as processor speeds increase more rapidly than memory speeds. Two complementary techniques for addressing memory latency and improving performance are: (a) cache locality enhancement for latency reduction and (b) data prefetching for latency tolerance. This paper studies the benefit of combining loop fusion for locality enhancement with prefetching. Experimental results are reported for multiprocessors with support for prefetching. For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%.

查看原文本刊更多论文

共享内存多处理器上循环融合与预取的结合

在共享内存多处理器上由并行循环组成的程序的性能受到长内存延迟的限制，因为处理器速度比内存速度增长得更快。解决内存延迟和提高性能的两种互补技术是:(a)缓存局域性增强以减少延迟;(b)数据预取以容忍延迟。研究了环融合局部增强与预取相结合的优点。报道了支持预取的多处理器的实验结果。对于SGI Power Challenge R10000上的完整应用程序，将环路融合与预取相结合可将并行加速提高46%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)

自引率

0.00%

发文量