The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture

Proceedings 20th Anniversary Conference on Advanced Research in VLSI Pub Date : 1999-03-21 DOI:10.1109/ARVLSI.1999.756053

Dana S. Henry, Bradley C. Kuszmaul, V. Viswanath

{"title":"The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture","authors":"Dana S. Henry, Bradley C. Kuszmaul, V. Viswanath","doi":"10.1109/ARVLSI.1999.756053","DOIUrl":null,"url":null,"abstract":"The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.","PeriodicalId":358015,"journal":{"name":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARVLSI.1999.756053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.

查看原文本刊更多论文

超标量处理器——渐近可扩展的超标量微体系结构

现有超标量处理器的可扩展性差一直是计算机工程界非常关注的问题。特别是，在现有实现中，许多组件的关键路径长度增长为/spl Theta/(n/sup 2/)，其中n是获取宽度，问题宽度或窗口大小。本文提出了一种新的实现，称为超标量处理器，它极大地减少了超标量处理器的渐近关键路径长度。处理器由大量带有控制器的alu(统称为执行站)通过并行前缀树电路网络连接在一起实现。胖树网络将一个交错缓存连接到执行站。这些网络提供了超标量处理器的全部功能，包括重命名、乱序执行和推测执行。由于门延迟，Ultrascalar的关键路径长度为/spl tau//sub gates/=/spl Theta/(log n)。线延迟和芯片尺寸取决于所提供的内存带宽和布局。如果提供的内存带宽是M(n)个每个时钟周期的内存操作，那么，使用h树VLSI布局，由于线延迟(光速延迟)的关键路径长度为/spl tau//sub wires/={/spl Theta/(n/sup 1/2/)如果M(n)为0 (n/sup 1/2-/spl epsiv//)对于/spl epsiv/>，[最优];{/spl Theta/(n/sup 1/2/log n)如果M(n)为/spl Theta/(n/sup 1/2/)，则[接近最优];和{/spl Theta/(M(n))如果M(n)是/spl Omega/(n/sup 1/2+/spl epsiv//)对于/spl epsiv/>，[最优](M适当约束)。面积是线延迟的平方。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 20th Anniversary Conference on Advanced Research in VLSI

自引率

0.00%

发文量