对未来安腾处理器的寄存器堆栈引擎的定量评估和优化

R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen
{"title":"对未来安腾处理器的寄存器堆栈引擎的定量评估和优化","authors":"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen","doi":"10.1109/INTERA.2002.995843","DOIUrl":null,"url":null,"abstract":"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.","PeriodicalId":224706,"journal":{"name":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Quantitative evaluation of the register stack engine and optimizations for future Itanium processors\",\"authors\":\"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen\",\"doi\":\"10.1109/INTERA.2002.995843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.\",\"PeriodicalId\":224706,\"journal\":{\"name\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTERA.2002.995843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTERA.2002.995843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

本文研究了标准Itanium体系结构中寄存器堆栈引擎(RSE)的效率,并引入了新的优化技术来提高RSE的性能。为了尽量减少物理寄存器文件的溢出和填充,应用优化来减少静态分配的寄存器堆栈帧中的内部碎片。通过利用动态寄存器使用率(DRU)和死寄存器值信息(DVI),处理器可以动态地指导寄存器帧的分配和释放。因此,具有动态确定的帧大小的推测分配的寄存器帧可以比静态确定的帧大小小得多,从而实现最小的溢出和填充。使用标准Itanium体系结构中的寄存器堆栈引擎(RSE)作为基准参考,我们使用一组使用不同编译器优化构建的SPEC CPU2000基准测试,对RSE和建议的优化进行了全面的研究和衡量。使用最常见的帧大小的帧分配策略和使用死寄存器信息的重分配策略的组合被证明是非常有效的。平均而言,在基准基准的基础上,可以减少71%的骨料泄漏和填充。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantitative evaluation of the register stack engine and optimizations for future Itanium processors
This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信