晶粒尺寸对收缩阵列性能的影响

ACM-SE 28 Pub Date : 1990-04-01 DOI:10.1145/98949.99041

Ravi Varadaragan, Bhavai Ravichandran

{"title":"晶粒尺寸对收缩阵列性能的影响","authors":"Ravi Varadaragan, Bhavai Ravichandran","doi":"10.1145/98949.99041","DOIUrl":null,"url":null,"abstract":"Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-","PeriodicalId":409883,"journal":{"name":"ACM-SE 28","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of grain size on the performance of systolic array\",\"authors\":\"Ravi Varadaragan, Bhavai Ravichandran\",\"doi\":\"10.1145/98949.99041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-\",\"PeriodicalId\":409883,\"journal\":{\"name\":\"ACM-SE 28\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1990-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM-SE 28\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/98949.99041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM-SE 28","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/98949.99041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

收缩数组属于一类流水线的射线架构，对计算约束算法的硬件实现很有用。系统阵列的设计包括将计算映射到以同步方式运行的处理器上，这样计算所需的数据就能在正确的时间到达正确的处理器。对于在执行时间和处理器-内存需求之间进行不同权衡的收缩数组的合成，需要使用自动或半自动映射工具。Lamport[1]使用超平面技术将嵌套迭代分配到异步并行处理器上。Moldovan和Fortes[3]提出了一种将嵌套循环算法映射到多维收缩数组的通用技术。Lee和Kedem[2]基于Lamport[1]的超平面方法提出了一个类似的技术，用于将嵌套循环映射到线性数组上。当嵌套循环中的每个迭代都有一定数量的语句或操作时，那么所有这些技术都不能解决如何将每个迭代划分为适当的粒度大小以允许依赖迭代的部分重叠的问题。在我们的工作中，我们已经证明了选择不同的粒度来划分迭代是如何影响收缩数组的性能的——通过执行时间、缓冲区存储、处理器利用率等来衡量。我们的结果表明，需要解决在映射算法到收缩阵列中选择适当粒度的问题。我们把注意力集中在映射嵌套循环算法上，这些算法可以在语句级或迭代级同步。现有的映射技术不允许异步执行迭代。依赖关系

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Effect of grain size on the performance of systolic array

Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM-SE 28

自引率

0.00%

发文量