对未来安腾处理器的寄存器堆栈引擎的定量评估和优化

Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures Pub Date : 2002-02-03 DOI:10.1109/INTERA.2002.995843

R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen

{"title":"对未来安腾处理器的寄存器堆栈引擎的定量评估和优化","authors":"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen","doi":"10.1109/INTERA.2002.995843","DOIUrl":null,"url":null,"abstract":"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.","PeriodicalId":224706,"journal":{"name":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Quantitative evaluation of the register stack engine and optimizations for future Itanium processors\",\"authors\":\"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen\",\"doi\":\"10.1109/INTERA.2002.995843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.\",\"PeriodicalId\":224706,\"journal\":{\"name\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTERA.2002.995843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTERA.2002.995843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

本文研究了标准Itanium体系结构中寄存器堆栈引擎(RSE)的效率，并引入了新的优化技术来提高RSE的性能。为了尽量减少物理寄存器文件的溢出和填充，应用优化来减少静态分配的寄存器堆栈帧中的内部碎片。通过利用动态寄存器使用率(DRU)和死寄存器值信息(DVI)，处理器可以动态地指导寄存器帧的分配和释放。因此，具有动态确定的帧大小的推测分配的寄存器帧可以比静态确定的帧大小小得多，从而实现最小的溢出和填充。使用标准Itanium体系结构中的寄存器堆栈引擎(RSE)作为基准参考，我们使用一组使用不同编译器优化构建的SPEC CPU2000基准测试，对RSE和建议的优化进行了全面的研究和衡量。使用最常见的帧大小的帧分配策略和使用死寄存器信息的重分配策略的组合被证明是非常有效的。平均而言，在基准基准的基础上，可以减少71%的骨料泄漏和填充。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantitative evaluation of the register stack engine and optimizations for future Itanium processors

This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures

自引率

0.00%

发文量