{"title":"晶粒尺寸对收缩阵列性能的影响","authors":"Ravi Varadaragan, Bhavai Ravichandran","doi":"10.1145/98949.99041","DOIUrl":null,"url":null,"abstract":"Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-","PeriodicalId":409883,"journal":{"name":"ACM-SE 28","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of grain size on the performance of systolic array\",\"authors\":\"Ravi Varadaragan, Bhavai Ravichandran\",\"doi\":\"10.1145/98949.99041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-\",\"PeriodicalId\":409883,\"journal\":{\"name\":\"ACM-SE 28\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1990-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM-SE 28\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/98949.99041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM-SE 28","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/98949.99041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effect of grain size on the performance of systolic array
Systolic arrays belong to a class of pipelined ar ray architectures useful for hardware implementation of compute-bound algorithms. The design of sys tolic arrays involves mapping the computations on to the processors that operate in a synchronized fash ion in such a way that the data needed in a com putation arrive at the right processor at the right time. Automatic or semi-automatic mapping tools are necessary for the synthesis of systolic arrays with different trade-olfs between execution time and processor-memory requirements. Lamport [1] used a hyperplane technique for allocating nested itera tions onto asynchronous parallel processors. A gen eral technique for mapping nested loop algorithms on to multi-dimensional systolic arrays was proposed by Moldovan and Fortes [3]. A similar technique was proposed by Lee and Kedem [2] for mapping nested loops on to linear arrays based on Lamport’s [1] hypcrplane method. When each iteration in the nested loop has a num ber of statements or operations, then all these tech niques do not address the issue of how to partition each iteration into proper grain sizes to allow partial overlapping of dependent iterations. In our work, we have demonstrated how different grain sizes chosen for partitioning the iterations can affect the perfor mance of systolic arrays as measured by execution time, buffer storage, processor utilization etc.. Our results indicate the need for addressing the issue of choosing proper grain sizes in mapping algorithms onto systolic arrays. We focus our attention on mapping nested loop algorithms that can be synchronized either at the statement level or at the iteration level. Existing techniques for mapping do not allow asynchronous execution of iterations. The dependencies that ex-