指令寄存器文件设计中的能量延迟权衡

Joonas Multanen, Heikki O. Kultala, P. Jääskeläinen
{"title":"指令寄存器文件设计中的能量延迟权衡","authors":"Joonas Multanen, Heikki O. Kultala, P. Jääskeläinen","doi":"10.1109/NORCHIP.2018.8573504","DOIUrl":null,"url":null,"abstract":"In order to decrease latency and energy consumption, processors use hierarchical memory systems to store temporally and spatially related instructions close to the core. Instruction register file (IRF) is an energy-efficient solution for the lowest level in the instruction memory hierarcy. Being compiler-controlled, it removes the area and energy overheads involved in cache tag checking and adds flexibility in the separation of the instruction fetch and execution. In this paper, we systematically evaluate for the first time the effect of three IRF design variations on energy and delay against an unoptimized baseline IRF. Having instruction fetch and decode with IRF in the same pipeline stage allows minimal delay branching, but results in low operating clock frequency and impaired energy delay product compared to splitting them into two stages. Assuring instruction presence in IRF before execution with software reduces the area and increases maximum clock frequency compared to assurance with hardware, but requires compiler analysis. With a proposed compiler-analyzed instruction placement and co-designed hardware implementation, energy consumption with the best IRF variant is reduced by 9% on average with EEMBC Coremark and CHStone benchmaks. The energy delay product is improved by 23% when compared to the baseline IRF approach.","PeriodicalId":152077,"journal":{"name":"2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Energy-Delay Trade-Offs in Instruction Register File Design\",\"authors\":\"Joonas Multanen, Heikki O. Kultala, P. Jääskeläinen\",\"doi\":\"10.1109/NORCHIP.2018.8573504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to decrease latency and energy consumption, processors use hierarchical memory systems to store temporally and spatially related instructions close to the core. Instruction register file (IRF) is an energy-efficient solution for the lowest level in the instruction memory hierarcy. Being compiler-controlled, it removes the area and energy overheads involved in cache tag checking and adds flexibility in the separation of the instruction fetch and execution. In this paper, we systematically evaluate for the first time the effect of three IRF design variations on energy and delay against an unoptimized baseline IRF. Having instruction fetch and decode with IRF in the same pipeline stage allows minimal delay branching, but results in low operating clock frequency and impaired energy delay product compared to splitting them into two stages. Assuring instruction presence in IRF before execution with software reduces the area and increases maximum clock frequency compared to assurance with hardware, but requires compiler analysis. With a proposed compiler-analyzed instruction placement and co-designed hardware implementation, energy consumption with the best IRF variant is reduced by 9% on average with EEMBC Coremark and CHStone benchmaks. The energy delay product is improved by 23% when compared to the baseline IRF approach.\",\"PeriodicalId\":152077,\"journal\":{\"name\":\"2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NORCHIP.2018.8573504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NORCHIP.2018.8573504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

为了减少延迟和能量消耗,处理器使用分层存储系统将时间和空间相关的指令存储在靠近核心的地方。指令寄存器文件(IRF)是指令存储器层次结构中最低层的一种节能解决方案。由于由编译器控制,它消除了缓存标记检查所涉及的面积和能量开销,并增加了指令获取和执行分离的灵活性。在本文中,我们首次系统地评估了三种IRF设计变化对未优化基线IRF的能量和延迟的影响。在同一管道阶段使用IRF进行指令提取和解码允许最小的延迟分支,但与将它们分成两个阶段相比,会导致低工作时钟频率和受损的能量延迟积。与使用硬件相比,使用软件在执行IRF之前确保指令存在可以减少面积并增加最大时钟频率,但需要编译器分析。通过编译器分析的指令放置和共同设计的硬件实现,使用EEMBC Coremark和CHStone基准测试,最佳IRF变体的能耗平均降低了9%。与基线IRF方法相比,能量延迟积提高了23%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Energy-Delay Trade-Offs in Instruction Register File Design
In order to decrease latency and energy consumption, processors use hierarchical memory systems to store temporally and spatially related instructions close to the core. Instruction register file (IRF) is an energy-efficient solution for the lowest level in the instruction memory hierarcy. Being compiler-controlled, it removes the area and energy overheads involved in cache tag checking and adds flexibility in the separation of the instruction fetch and execution. In this paper, we systematically evaluate for the first time the effect of three IRF design variations on energy and delay against an unoptimized baseline IRF. Having instruction fetch and decode with IRF in the same pipeline stage allows minimal delay branching, but results in low operating clock frequency and impaired energy delay product compared to splitting them into two stages. Assuring instruction presence in IRF before execution with software reduces the area and increases maximum clock frequency compared to assurance with hardware, but requires compiler analysis. With a proposed compiler-analyzed instruction placement and co-designed hardware implementation, energy consumption with the best IRF variant is reduced by 9% on average with EEMBC Coremark and CHStone benchmaks. The energy delay product is improved by 23% when compared to the baseline IRF approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信