BOLT:节能的乱序容忍延迟执行

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416634

Andrew D. Hilton, A. Roth

{"title":"BOLT:节能的乱序容忍延迟执行","authors":"Andrew D. Hilton, A. Roth","doi":"10.1109/HPCA.2010.5416634","DOIUrl":null,"url":null,"abstract":"LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"966 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution\",\"authors\":\"Andrew D. Hilton, A. Roth\",\"doi\":\"10.1109/HPCA.2010.5416634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.\",\"PeriodicalId\":368621,\"journal\":{\"name\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"volume\":\"966 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2010.5416634\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

LT(容忍延迟)执行是未来乱序核的一种有吸引力的候选技术。LT将LLC(最后一级缓存)错误的前向切片延迟到一个切片缓冲区，并在错误返回时重新执行它们。LT核心可以提高ILP，而无需物理地扩展问题队列和注册文件，并且可以提高MLP，而无需额外的软件线程，从而降低缓存性能。不幸的是，拟议的LT设计并不节能。它们需要太多额外的结构，它们延迟和重新执行太多指令，以证明它们的性能收益是合理的。在本文中，我们将解决这些低效率问题。我们介绍了一种称为BOLT(更好的乱序延迟容忍)的微架构，它实现了LT作为SMT(同步多线程)的替代使用。我们还提出了一种新的切片缓冲组织和遍历方案，通过修剪无用和冗余的LT实例来提高性能并减少开销。总的来说，这些修改将无序LT转变为一种以节能方式提高性能的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution

LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量