A two-phase recovery mechanism

Proceedings of the 2018 International Conference on Supercomputing Pub Date : 2018-06-12 DOI:10.1145/3205289.3205300

Zhaoxiang Jin, Soner Önder

{"title":"A two-phase recovery mechanism","authors":"Zhaoxiang Jin, Soner Önder","doi":"10.1145/3205289.3205300","DOIUrl":null,"url":null,"abstract":"Superscalar processors take advantage of speculative execution to improve performance. When the speculation turns out to be incorrect, a recovery procedure is initiated. The back-end of the processor cannot be flushed due to having a mixture of both valid and invalid instructions. A basic solution is to wait for all valid instructions to retire and then purge the invalid instructions. However, if a long latency operation, such as a Last-level Cache (LLC) miss appears before the misspeculation point, the back-end recovery time significantly increases. Many proposed mechanisms selectively flush invalid instructions in order to speed up the back-end recovery. In general, these mechanisms rely on broadcasting some misprediction related tags to remove the instructions from any backend structures, such as ROB, LSQ, RS, etc. The hardware overhead in these mechanisms is nontrivial and can potentially affect the processor clock cycle time if they are on the critical path. Moreover, a checkpointing mechanism or a walker needs to be added to accelerate the recovery of the front-end register alias table (F-RAT). We propose a two-phase recovery mechanism which does not need any walking or broadcasting process and can still match the performance of the state-of-the-art recovery approaches. The first phase works similar to a typical basic recovery mechanism and the second phase is not triggered until the backend is stalled by an LLC miss load. In that case, the second phase treats the load as a misspeculation and recovers from this load. Since the LLC miss response time is usually much longer than the time to fill the entire pipeline with new instructions, in most cases our mechanism can completely overlap the branch misprediction recovery penalty with the cache miss penalty.","PeriodicalId":441217,"journal":{"name":"Proceedings of the 2018 International Conference on Supercomputing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3205289.3205300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Superscalar processors take advantage of speculative execution to improve performance. When the speculation turns out to be incorrect, a recovery procedure is initiated. The back-end of the processor cannot be flushed due to having a mixture of both valid and invalid instructions. A basic solution is to wait for all valid instructions to retire and then purge the invalid instructions. However, if a long latency operation, such as a Last-level Cache (LLC) miss appears before the misspeculation point, the back-end recovery time significantly increases. Many proposed mechanisms selectively flush invalid instructions in order to speed up the back-end recovery. In general, these mechanisms rely on broadcasting some misprediction related tags to remove the instructions from any backend structures, such as ROB, LSQ, RS, etc. The hardware overhead in these mechanisms is nontrivial and can potentially affect the processor clock cycle time if they are on the critical path. Moreover, a checkpointing mechanism or a walker needs to be added to accelerate the recovery of the front-end register alias table (F-RAT). We propose a two-phase recovery mechanism which does not need any walking or broadcasting process and can still match the performance of the state-of-the-art recovery approaches. The first phase works similar to a typical basic recovery mechanism and the second phase is not triggered until the backend is stalled by an LLC miss load. In that case, the second phase treats the load as a misspeculation and recovers from this load. Since the LLC miss response time is usually much longer than the time to fill the entire pipeline with new instructions, in most cases our mechanism can completely overlap the branch misprediction recovery penalty with the cache miss penalty.

查看原文本刊更多论文

两阶段恢复机制

超标量处理器利用推测执行来提高性能。当猜测被证明是不正确时，就会启动恢复程序。处理器的后端由于混合了有效和无效的指令而无法刷新。一个基本的解决方案是等待所有有效指令退役，然后清除无效指令。但是，如果在错误猜测点之前出现长延迟操作(例如Last-level Cache (LLC) miss))，则后端恢复时间将显著增加。许多提出的机制选择性地清除无效指令，以加快后端恢复。一般来说，这些机制依赖于广播一些与错误预测相关的标签，以从任何后端结构(如ROB、LSQ、RS等)中删除指令。这些机制中的硬件开销非常大，如果它们位于关键路径上，可能会影响处理器时钟周期时间。此外，还需要添加检查点机制或行走器来加速前端寄存器别名表(F-RAT)的恢复。我们提出了一种两阶段恢复机制，它不需要任何行走或广播过程，并且仍然可以匹配最先进的恢复方法的性能。第一阶段的工作原理类似于典型的基本恢复机制，第二阶段直到后端因LLC miss负载而停滞时才会触发。在这种情况下，第二阶段将负载视为错误猜测并从该负载中恢复。由于LLC miss响应时间通常比用新指令填充整个管道的时间长得多，在大多数情况下，我们的机制可以完全重叠分支错误预测恢复惩罚和缓存miss惩罚。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Conference on Supercomputing

自引率

0.00%

发文量