Memory bank disambiguation using modulo unrolling for Raw machines

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI:10.1109/HIPC.1998.737991

R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal

{"title":"Memory bank disambiguation using modulo unrolling for Raw machines","authors":"R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal","doi":"10.1109/HIPC.1998.737991","DOIUrl":null,"url":null,"abstract":"We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can be performed by well known techniques which generate the requisite communication code on distributed address-space architectures. On the other hand, the fast, static network provides the compiler with a simple interface to optimize such communication. This paper addresses the problem of taking advantage of such static communication for memory accesses. The requirement for static memory communication is the compile-time knowledge of the exact communication required for each memory reference. This knowledge, in turn, can be obtained if a memory reference refers exclusively to memory residing on a single processing tile. We introduce modulo unrolling as a technique which allows the static communication of a large class of array accesses. We show how this technique achieves the goal of static communication by using a relatively small unroll factor. For a set of dense matrix scientific applications, we are able to access all the array references on the static network, enabling scalable speedups on the Raw machine.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPC.1998.737991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can be performed by well known techniques which generate the requisite communication code on distributed address-space architectures. On the other hand, the fast, static network provides the compiler with a simple interface to optimize such communication. This paper addresses the problem of taking advantage of such static communication for memory accesses. The requirement for static memory communication is the compile-time knowledge of the exact communication required for each memory reference. This knowledge, in turn, can be obtained if a memory reference refers exclusively to memory residing on a single processing tile. We introduce modulo unrolling as a technique which allows the static communication of a large class of array accesses. We show how this technique achieves the goal of static communication by using a relatively small unroll factor. For a set of dense matrix scientific applications, we are able to access all the array references on the static network, enabling scalable speedups on the Raw machine.

查看原文本刊更多论文

基于模展开的原始机器内存库消歧

我们提出了模展开，这是一种代码转换技术，可以通过Raw机器上的快速静态网络访问数组引用。Raw机器由简单、复制的瓷砖组成，通过互连连接，支持快速、静态的近邻通信。与所有其他资源一样，内存分布在各个块之间。存储器的管理可以通过在分布式地址空间体系结构上生成必要的通信代码的众所周知的技术来执行。另一方面，快速、静态的网络为编译器提供了一个简单的接口来优化这种通信。本文解决了利用这种静态通信进行内存访问的问题。静态内存通信的要求是在编译时了解每个内存引用所需的确切通信。反过来，如果内存引用专门指向驻留在单个处理块上的内存，则可以获得这种知识。我们介绍模展开作为一种技术，它允许大量数组访问的静态通信。我们将展示该技术如何通过使用相对较小的展开因子来实现静态通信的目标。对于一组密集的矩阵科学应用程序，我们能够访问静态网络上的所有数组引用，从而在Raw机器上实现可扩展的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

自引率

0.00%

发文量