R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen
{"title":"对未来安腾处理器的寄存器堆栈引擎的定量评估和优化","authors":"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen","doi":"10.1109/INTERA.2002.995843","DOIUrl":null,"url":null,"abstract":"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.","PeriodicalId":224706,"journal":{"name":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Quantitative evaluation of the register stack engine and optimizations for future Itanium processors\",\"authors\":\"R. D. Weldon, Steven S. Chang, Hong Wang, Gerolf Hoflehner, P. Wang, Daniel M. Lavery, John Paul Shen\",\"doi\":\"10.1109/INTERA.2002.995843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.\",\"PeriodicalId\":224706,\"journal\":{\"name\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTERA.2002.995843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Sixth Annual Workshop on Interaction between Compilers and Computer Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTERA.2002.995843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quantitative evaluation of the register stack engine and optimizations for future Itanium processors
This paper examines the efficiency of the register stack engine (RSE) in the canonical Itanium architecture, and introduces novel optimization techniques to enhance the RSE performance. To minimize spills and fills of the physical register file, optimizations are applied to reduce internal fragmentation in statically allocated register stack frames. Through the use of dynamic register usage (DRU) and dead register value information (DVI), the processor can dynamically guide allocation and deallocation of register frames. Consequently, a speculatively allocated register frame with a dynamically determined frame size can be much smaller than the statically determined frame size, thus achieving minimum spills and fills. Using the register stack engine (RSE) in the canonical Itanium architecture as the baseline reference, we thoroughly study and gauge the tradeoffs of the RSE and the proposed optimizations using a set of SPEC CPU2000 benchmarks built with different compiler optimizations. A combination of frame allocation policies using the most frequent frame size and deallocation policies using dead register information proves to be highly effective. On average, a 71% reduction in aggregate spills and fills can be achieved over the baseline reference.