M. Alipour, S. Kaxiras, D. Black-Schaffer, Rakesh Kumar
{"title":"延迟和旁路:非顺序处理器中的就绪和临界感知指令调度","authors":"M. Alipour, S. Kaxiras, D. Black-Schaffer, Rakesh Kumar","doi":"10.1109/HPCA47549.2020.00042","DOIUrl":null,"url":null,"abstract":"Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy. In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy. Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"19 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors\",\"authors\":\"M. Alipour, S. Kaxiras, D. Black-Schaffer, Rakesh Kumar\",\"doi\":\"10.1109/HPCA47549.2020.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy. In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy. Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"19 8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
摘要
灵活的指令调度对于无序处理器的性能至关重要。这通常是通过使用基于 CAM 的指令队列 (IQ) 来实现的,该队列可完全灵活地选择准备就绪的指令进行执行,但要付出巨大的调度能耗。在这项工作中,我们试图通过减少 IQ 的深度和宽度来降低指令调度能耗。为此,我们根据指令的就绪程度和关键性对指令进行分类,并利用这些信息绕过 IQ,执行无法从其昂贵的调度结构中获益的指令,以及延迟不会损害性能的指令。结合这些方法,我们就能在不影响性能的情况下,将 IQ 中的大部分指令卸载到成本更低的基于先进先出的调度结构中。因此,我们可以将 IQ 深度和宽度减少一半,从而节省能源。我们的设计 "延迟和旁路(DNB)"是第一个明确处理就绪性和临界性以降低调度能耗的设计。通过处理这两类问题,我们能够实现 95% 的基准无序性能,而调度能耗仅为 33%。与之前只处理临界度或就绪度的设计相比,这是一项重大改进(91%/89% 的性能,74%/53% 的能耗)。
Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors
Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy. In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass the IQ for instructions that will not benefit from its expensive scheduling structures and delay instructions that will not harm performance. Combined, these approaches allow us to offload a significant portion of the instructions from the IQ to much cheaper FIFO-based scheduling structures without hurting performance. As a result we can reduce the IQ depth and width by half, thereby saving energy. Our design, Delay and Bypass (DNB), is the first design to explicitly address both readiness and criticality to reduce scheduling energy. By handling both classes we are able to achieve 95% of the baseline out-of-order performance while only using 33% of the scheduling energy. This represents a significant improvement over previous designs which addressed only criticality or readiness (91%/89% performance at 74%/53% energy).