“Combining” as a compilation technique for VLIW architectures

MICRO 22 Pub Date : 1989-08-01 DOI:10.1145/75362.75401

T. Nakatani, K. Ebcioglu

{"title":"“Combining” as a compilation technique for VLIW architectures","authors":"T. Nakatani, K. Ebcioglu","doi":"10.1145/75362.75401","DOIUrl":null,"url":null,"abstract":"Combining is a local compiler optimization technique that can enhance the performance of global compaction techniques for VLIW machines. Given two adjacent operations of a certain class that are flow (read-after-write) dependent and that cannot be placed in the same micro-instruction, the combining technique can transform the operations so that the modified operations have no dependence. The transformed operations can be executed in the same micro-instruction, thus allowing the total execution time of the program to be reduced. In this paper, combining a pair of flow-dependent operations into a wide instruction word is suggested as an important compilation technique for VLIW architectures. Combining is particularly effective with software pipelining and loop unrolling since combinable operations can come together with a higher probability when these compilation techniques are used. We implemented combining in our parallelizing compiler for the wide instruction word architecture, which is now being built at the IBM T. J. Watson Research Center. It is shown that ten percent speedup is obtained on the Stanford integer benchmarks and other sequential-matured C programs, in comparison to compaction techniques that do not use combining. For a class of inner loops, combining can remove the inter-iteration dependencies completely and can improve performance in the same ratio as the loop is unrolled.","PeriodicalId":365456,"journal":{"name":"MICRO 22","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MICRO 22","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/75362.75401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

Combining is a local compiler optimization technique that can enhance the performance of global compaction techniques for VLIW machines. Given two adjacent operations of a certain class that are flow (read-after-write) dependent and that cannot be placed in the same micro-instruction, the combining technique can transform the operations so that the modified operations have no dependence. The transformed operations can be executed in the same micro-instruction, thus allowing the total execution time of the program to be reduced. In this paper, combining a pair of flow-dependent operations into a wide instruction word is suggested as an important compilation technique for VLIW architectures. Combining is particularly effective with software pipelining and loop unrolling since combinable operations can come together with a higher probability when these compilation techniques are used. We implemented combining in our parallelizing compiler for the wide instruction word architecture, which is now being built at the IBM T. J. Watson Research Center. It is shown that ten percent speedup is obtained on the Stanford integer benchmarks and other sequential-matured C programs, in comparison to compaction techniques that do not use combining. For a class of inner loops, combining can remove the inter-iteration dependencies completely and can improve performance in the same ratio as the loop is unrolled.

查看原文本刊更多论文

“组合”作为VLIW体系结构的编译技术

组合是一种局部编译器优化技术，可以提高VLIW机器的全局压缩技术的性能。给定某类中相邻的两个操作依赖于流(读后写)，且不能放在同一微指令中，组合技术可以对操作进行转换，使修改后的操作不依赖。转换后的操作可以在同一微指令中执行，从而减少了程序的总执行时间。本文提出将一对流相关操作组合成一个宽指令字作为VLIW体系结构的重要编译技术。组合对于软件流水线和循环展开特别有效，因为当使用这些编译技术时，可组合操作可以以更高的概率一起出现。我们在宽指令字架构的并行化编译器中实现了组合，该架构目前正在IBM t.j. Watson研究中心构建。结果表明，与不使用组合的压缩技术相比，在斯坦福整数基准测试和其他顺序成熟的C程序上获得了10%的加速。对于一类内部循环，组合可以完全消除迭代间的依赖关系，并且可以在展开循环时以相同的比例提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MICRO 22

自引率

0.00%

发文量