关于寄存器移动的有效性,以尽量减少软件流水线循环中的后通展开

Mounira Bachir, Albert Cohen, S. Touati
{"title":"关于寄存器移动的有效性,以尽量减少软件流水线循环中的后通展开","authors":"Mounira Bachir, Albert Cohen, S. Touati","doi":"10.1109/HPCSim.2012.6266972","DOIUrl":null,"url":null,"abstract":"Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.","PeriodicalId":428764,"journal":{"name":"2012 International Conference on High Performance Computing & Simulation (HPCS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops\",\"authors\":\"Mounira Bachir, Albert Cohen, S. Touati\",\"doi\":\"10.1109/HPCSim.2012.6266972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.\",\"PeriodicalId\":428764,\"journal\":{\"name\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSim.2012.6266972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2012.6266972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

软件流水线是一种强大的技术,可以暴露细粒度的并行性,但它会导致变量在多个内核迭代中保持活跃。它需要定期分配寄存器,并且对代码生成具有挑战性:目前缺乏可靠的解决方案限制了软件流水线的适用性。不改变计算吞吐量的经典软件解决方案是在后验展开循环[11],[10]。然而,结果展开程度往往是不可接受的,可能达到荒谬的水平。另外,由于软件注册重命名,可以避免循环展开。这是通过插入移动操作来实现的,但这可能会增加启动间隔(II),从而抵消了软件流水线的好处。本文旨在严格控制生成代码所必需的传递后循环展开。我们研究了实时范围分割的潜力,以减少内核循环展开,在不增加II的情况下引入额外的移动指令。我们提供了问题的完整形式化,算法和广泛的实验。我们的算法在大多数情况下产生较低的展开度-没有增加II。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops
Software pipelining is a powerful technique to expose fine-grain parallelism, but it results in variables staying alive across more than one kernel iteration. It requires periodic register allocation and is challenging for code generation: the lack of a reliable solution currently restricts the applicability of software pipelining. The classical software solution that does not alter the computation throughput consists in unrolling the loop a posteriori [11], [10]. However, the resulting unrolling degree is often unacceptable and may reach absurd levels. Alternatively, loop unrolling can be avoided thanks to software register renaming. This is achieved through the insertion of move operations, but this may increase the initiation interval (II) which nullifies the benefits of software pipelining. This article aims at tightly controling the post-pass loop unrolling necessary to generate code. We study the potential of live range splitting to reduce kernel loop unrolling, introducing additional move instructions without inscreasing the II. We provide a complete formalisation of the problem, an algorithm, and extensive experiments. Our algorithm yields low unrolling degrees in most cases - with no increase of the II.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信