CASINO核心微架构:使用级联的顺序调度窗口生成乱序调度

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2020-02-01 DOI:10.1109/HPCA47549.2020.00039

Ipoom Jeong, Seihoon Park, Changmin Lee, W. Ro

{"title":"CASINO核心微架构:使用级联的顺序调度窗口生成乱序调度","authors":"Ipoom Jeong, Seihoon Park, Changmin Lee, W. Ro","doi":"10.1109/HPCA47549.2020.00039","DOIUrl":null,"url":null,"abstract":"The performance gap between in-order (InO) and out-of-order (OoO) cores comes from the ability to dynamically create highly optimized instruction issue schedules. In this work, we observe that a significant amount of performance benefit of OoO scheduling can also be attained by supplementing a traditional InO core with a small and speculative instruction scheduling window, namely SpecInO. SpecInO monitors a small set of instructions ahead of a conventional InO scheduling window, aiming at issuing ready instructions behind long-latency stalls. Simulation results show that SpecInO captures and issues 62% of dynamic instructions out of program order. To this end, we propose a CASINO core microarchitecture that dynamically and speculatively generates OoO schedules with near-InO complexity, using CAScaded IN-Order scheduling windows. A Speculative IQ (S-IQ) issues an instruction if it is ready, or otherwise passes it to the next IQ. At the last IQ, instructions are scheduled in program order along serial dependence chains. The net effect is OoO scheduling via collaboration between cascaded InO IQs. To support speculative execution with minimal cost overhead, we propose a novel register renaming technique that allocates free physical registers only to instructions issued from the S-IQ. The proposed core performs dynamic memory disambiguation via an on-commit value check by extending the store buffer already existing in an InO core. We further optimize energy efficiency by filtering out redundant associative searches performed by speculated loads. In our analysis, CASINO core improves performance by 51% over an InO core (within 10 percentage points of an OoO core), which results in 25% and 42% improvements in energy efficiency over InO and OoO cores, respectively.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows\",\"authors\":\"Ipoom Jeong, Seihoon Park, Changmin Lee, W. Ro\",\"doi\":\"10.1109/HPCA47549.2020.00039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance gap between in-order (InO) and out-of-order (OoO) cores comes from the ability to dynamically create highly optimized instruction issue schedules. In this work, we observe that a significant amount of performance benefit of OoO scheduling can also be attained by supplementing a traditional InO core with a small and speculative instruction scheduling window, namely SpecInO. SpecInO monitors a small set of instructions ahead of a conventional InO scheduling window, aiming at issuing ready instructions behind long-latency stalls. Simulation results show that SpecInO captures and issues 62% of dynamic instructions out of program order. To this end, we propose a CASINO core microarchitecture that dynamically and speculatively generates OoO schedules with near-InO complexity, using CAScaded IN-Order scheduling windows. A Speculative IQ (S-IQ) issues an instruction if it is ready, or otherwise passes it to the next IQ. At the last IQ, instructions are scheduled in program order along serial dependence chains. The net effect is OoO scheduling via collaboration between cascaded InO IQs. To support speculative execution with minimal cost overhead, we propose a novel register renaming technique that allocates free physical registers only to instructions issued from the S-IQ. The proposed core performs dynamic memory disambiguation via an on-commit value check by extending the store buffer already existing in an InO core. We further optimize energy efficiency by filtering out redundant associative searches performed by speculated loads. In our analysis, CASINO core improves performance by 51% over an InO core (within 10 percentage points of an OoO core), which results in 25% and 42% improvements in energy efficiency over InO and OoO cores, respectively.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00039\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

有序核(InO)和无序核(OoO)之间的性能差距来自于动态创建高度优化的指令问题调度的能力。在这项工作中，我们观察到OoO调度的显著性能优势也可以通过在传统的InO内核中添加一个小而推测的指令调度窗口(即SpecInO)来获得。SpecInO在传统的InO调度窗口之前监视一小组指令，目的是在长延迟延迟后发出准备好的指令。仿真结果表明，SpecInO捕获并发出62%的无序动态指令。为此，我们提出了一个CASINO核心微架构，该架构使用级联的IN-Order调度窗口，动态地和推测地生成接近0复杂度的oo调度。一个思辨型智商(S-IQ)如果准备好了就发出指令，否则就把指令传递给下一个智商。在最后的IQ中，指令沿着串行依赖链按程序顺序调度。净效果是通过级联的iq之间的协作来进行oo调度。为了以最小的成本开销支持推测执行，我们提出了一种新的寄存器重命名技术，该技术仅为S-IQ发出的指令分配空闲的物理寄存器。该核心通过扩展已存在于InO核心中的存储缓冲区，通过提交时值检查来执行动态内存消歧。我们通过过滤由推测负载执行的冗余关联搜索来进一步优化能源效率。在我们的分析中，CASINO核心比InO核心提高了51%的性能(比OoO核心提高了10个百分点)，这使得能效比InO和OoO核心分别提高了25%和42%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CASINO Core Microarchitecture: Generating Out-of-Order Schedules Using Cascaded In-Order Scheduling Windows

The performance gap between in-order (InO) and out-of-order (OoO) cores comes from the ability to dynamically create highly optimized instruction issue schedules. In this work, we observe that a significant amount of performance benefit of OoO scheduling can also be attained by supplementing a traditional InO core with a small and speculative instruction scheduling window, namely SpecInO. SpecInO monitors a small set of instructions ahead of a conventional InO scheduling window, aiming at issuing ready instructions behind long-latency stalls. Simulation results show that SpecInO captures and issues 62% of dynamic instructions out of program order. To this end, we propose a CASINO core microarchitecture that dynamically and speculatively generates OoO schedules with near-InO complexity, using CAScaded IN-Order scheduling windows. A Speculative IQ (S-IQ) issues an instruction if it is ready, or otherwise passes it to the next IQ. At the last IQ, instructions are scheduled in program order along serial dependence chains. The net effect is OoO scheduling via collaboration between cascaded InO IQs. To support speculative execution with minimal cost overhead, we propose a novel register renaming technique that allocates free physical registers only to instructions issued from the S-IQ. The proposed core performs dynamic memory disambiguation via an on-commit value check by extending the store buffer already existing in an InO core. We further optimize energy efficiency by filtering out redundant associative searches performed by speculated loads. In our analysis, CASINO core improves performance by 51% over an InO core (within 10 percentage points of an OoO core), which results in 25% and 42% improvements in energy efficiency over InO and OoO cores, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量