Complexity-Effective Superscalar Processors

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI:10.1145/264107.264201

Subbarao Palacharla, N. Jouppi, James E. Smith

{"title":"Complexity-Effective Superscalar Processors","authors":"Subbarao Palacharla, N. Jouppi, James E. Smith","doi":"10.1145/264107.264201","DOIUrl":null,"url":null,"abstract":"The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and 0.18µm. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster --- consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"921","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 921

Abstract

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and 0.18µm. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster --- consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines.

查看原文本刊更多论文

复杂性-有效的超标量处理器

研究了硬件复杂度和时钟速度之间的性能权衡。首先，定义一个泛型超标量管道。然后分析了寄存器重命名、指令窗口唤醒和选择逻辑以及操作数绕过的具体领域。每个模型和Spice模拟特征尺寸为0.8µm, 0.35µm和0.18µm。性能结果和趋势以问题宽度和窗口大小表示。我们的分析表明，窗口唤醒和选择逻辑以及操作数旁路逻辑可能是未来最关键的。提出并讨论了一种简化唤醒和选择逻辑的微体系结构。该实现将相关指令链放入队列，并从多个队列并行发出指令。当以时钟周期测量性能时，仿真显示与完全灵活的问题窗口相比，几乎没有减速。此外，因为只有队列头部的指令需要被唤醒和选择，所以问题逻辑得到了简化，时钟周期也更快——因此整体性能得到了提高。通过将相关指令分组在一起，所提出的微架构将有助于最大限度地减少由于在未来的大问题机器中缓慢绕过而导致的性能下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量