Out-of-order commit processors

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI:10.1109/HPCA.2004.10008

A. Cristal, Daniel Ortega, J. Llosa, M. Valero

{"title":"Out-of-order commit processors","authors":"A. Cristal, Daniel Ortega, J. Llosa, M. Valero","doi":"10.1109/HPCA.2004.10008","DOIUrl":null,"url":null,"abstract":"Modern out-of-order processors tolerate long latency memory operations by supporting a large number of in-flight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data soon enough. In order to support more in-flight instructions, several resources have to be up-sized, such as the reorder buffer (ROB), the general purpose instructions queues, the load/store queue and the number of physical registers in the processor. However, scaling-up the number of entries in these resources is impractical because of area, cycle time, and power consumption constraints. We propose to increase the capacity of future processors by augmenting the number of in-flight instructions. Instead of simply up-sizing resources, we push for new and novel microarchitectural structures that achieve the same performance benefits but with a much lower need for resources. Our main contribution is a new checkpointing mechanism that is capable of keeping thousands of in-flight instructions at a practically constant cost. We also propose a queuing mechanism that takes advantage of the differences in waiting time of the instructions in the flow. Using these two mechanisms our processor has a performance degradation of only 10% for SPEC2000fp over a conventional processor requiring more than an order of magnitude additional entries in the ROB and instruction queues, and about a 200% improvement over a current processor with a similar number of entries.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"148","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2004.10008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 148

Abstract

Modern out-of-order processors tolerate long latency memory operations by supporting a large number of in-flight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data soon enough. In order to support more in-flight instructions, several resources have to be up-sized, such as the reorder buffer (ROB), the general purpose instructions queues, the load/store queue and the number of physical registers in the processor. However, scaling-up the number of entries in these resources is impractical because of area, cycle time, and power consumption constraints. We propose to increase the capacity of future processors by augmenting the number of in-flight instructions. Instead of simply up-sizing resources, we push for new and novel microarchitectural structures that achieve the same performance benefits but with a much lower need for resources. Our main contribution is a new checkpointing mechanism that is capable of keeping thousands of in-flight instructions at a practically constant cost. We also propose a queuing mechanism that takes advantage of the differences in waiting time of the instructions in the flow. Using these two mechanisms our processor has a performance degradation of only 10% for SPEC2000fp over a conventional processor requiring more than an order of magnitude additional entries in the ROB and instruction queues, and about a 200% improvement over a current processor with a similar number of entries.

查看原文本刊更多论文

乱序提交处理器

现代乱序处理器通过支持大量运行中的指令来容忍长延迟的内存操作。这在分支推测通常不是问题的数值应用程序中特别有用，并且缓存层次结构不能足够快地传递数据。为了支持更多的运行中的指令，必须增大一些资源的大小，例如重新排序缓冲区(ROB)、通用指令队列、加载/存储队列和处理器中的物理寄存器数量。然而，由于面积、周期时间和功耗限制，扩大这些资源中的条目数量是不切实际的。我们建议通过增加飞行指令的数量来增加未来处理器的容量。我们不是简单地增加资源，而是推动新的和新颖的微架构结构，这些结构可以实现相同的性能优势，但对资源的需求要低得多。我们的主要贡献是一种新的检查点机制，它能够以几乎恒定的成本保存数千条飞行指令。我们还提出了一种利用流中指令等待时间差异的排队机制。使用这两种机制，我们的处理器在SPEC2000fp上的性能下降仅为传统处理器的10%，而传统处理器需要在ROB和指令队列中增加一个数量级以上的条目，并且比具有类似条目数量的当前处理器提高了约200%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

10th International Symposium on High Performance Computer Architecture (HPCA'04)

自引率

0.00%

发文量