BOLT:节能的乱序容忍延迟执行

Andrew D. Hilton, A. Roth
{"title":"BOLT:节能的乱序容忍延迟执行","authors":"Andrew D. Hilton, A. Roth","doi":"10.1109/HPCA.2010.5416634","DOIUrl":null,"url":null,"abstract":"LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"966 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution\",\"authors\":\"Andrew D. Hilton, A. Roth\",\"doi\":\"10.1109/HPCA.2010.5416634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.\",\"PeriodicalId\":368621,\"journal\":{\"name\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"volume\":\"966 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2010.5416634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

摘要

LT(容忍延迟)执行是未来乱序核的一种有吸引力的候选技术。LT将LLC(最后一级缓存)错误的前向切片延迟到一个切片缓冲区,并在错误返回时重新执行它们。LT核心可以提高ILP,而无需物理地扩展问题队列和注册文件,并且可以提高MLP,而无需额外的软件线程,从而降低缓存性能。不幸的是,拟议的LT设计并不节能。它们需要太多额外的结构,它们延迟和重新执行太多指令,以证明它们的性能收益是合理的。在本文中,我们将解决这些低效率问题。我们介绍了一种称为BOLT(更好的乱序延迟容忍)的微架构,它实现了LT作为SMT(同步多线程)的替代使用。我们还提出了一种新的切片缓冲组织和遍历方案,通过修剪无用和冗余的LT实例来提高性能并减少开销。总的来说,这些修改将无序LT转变为一种以节能方式提高性能的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution
LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信