管道超级计算机的标量编译技术研究

Proceedings of the second international conference on Architectual support for programming languages and operating systems Pub Date : 1987-10-01 DOI:10.1145/36206.36191

S. Weiss, James E. Smith

{"title":"管道超级计算机的标量编译技术研究","authors":"S. Weiss, James E. Smith","doi":"10.1145/36206.36191","DOIUrl":null,"url":null,"abstract":"This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"85","resultStr":"{\"title\":\"A study of scalar compilation techniques for pipelined supercomputers\",\"authors\":\"S. Weiss, James E. Smith\",\"doi\":\"10.1145/36206.36191\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.\",\"PeriodicalId\":117067,\"journal\":{\"name\":\"Proceedings of the second international conference on Architectual support for programming languages and operating systems\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1987-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"85\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the second international conference on Architectual support for programming languages and operating systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/36206.36191\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/36206.36191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 85

摘要

本文研究了提高高速科学处理器标量性能的两种编译技术:软件流水线和循环展开。我们研究了结构(寄存器文件的大小)和硬件(指令缓冲区的大小)对循环展开效率的影响。我们还开发了一种对软件流水线技术进行分类的方法。对于循环展开，一种简单的调度算法在不受递归或内存危害的抑制时可以产生接近最优的结果。软件流水线需要较少的硬件，但也获得较少的加速。最后，我们证明了使用改进的CRAY-1S标量架构和利用循环展开的代码调度程序所产生的性能与使用矢量单元和CFT向量化编译器的CRAY-1S所实现的性能相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A study of scalar compilation techniques for pipelined supercomputers

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the second international conference on Architectual support for programming languages and operating systems

自引率

0.00%

发文量