关于寄存器移动的有效性，以尽量减少软件流水线循环中的后通展开

2012 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2012-07-02 DOI:10.1109/HPCSim.2012.6266972

Mounira Bachir, Albert Cohen, S. Touati

{"title":"关于寄存器移动的有效性，以尽量减少软件流水线循环中的后通展开","authors":"Mounira Bachir, Albert Cohen, S. Touati","doi":"10.1109/HPCSim.2012.6266972","DOIUrl":null,"url":null,"abstract":"Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops\",\"authors\":\"Mounira Bachir, Albert Cohen, S. Touati\",\"doi\":\"10.1109/HPCSim.2012.6266972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.\",\"PeriodicalId\":428764,\"journal\":{\"name\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2012.6266972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2012.6266972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

软件流水线是一种强大的技术，可以暴露细粒度的并行性，但它会导致变量在多个内核迭代中保持活跃。它需要定期分配寄存器，并且对代码生成具有挑战性:目前缺乏可靠的解决方案限制了软件流水线的适用性。不改变计算吞吐量的经典软件解决方案是在后验展开循环[11]，[10]。然而，结果展开程度往往是不可接受的，可能达到荒谬的水平。另外，由于软件注册重命名，可以避免循环展开。这是通过插入移动操作来实现的，但这可能会增加启动间隔(II)，从而抵消了软件流水线的好处。本文旨在严格控制生成代码所必需的传递后循环展开。我们研究了实时范围分割的潜力，以减少内核循环展开，在不增加II的情况下引入额外的移动指令。我们提供了问题的完整形式化，算法和广泛的实验。我们的算法在大多数情况下产生较低的展开度-没有增加II。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops

Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量