The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture

Dana S. Henry, Bradley C. Kuszmaul, V. Viswanath
{"title":"The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture","authors":"Dana S. Henry, Bradley C. Kuszmaul, V. Viswanath","doi":"10.1109/ARVLSI.1999.756053","DOIUrl":null,"url":null,"abstract":"The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.","PeriodicalId":358015,"journal":{"name":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARVLSI.1999.756053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.
超标量处理器——渐近可扩展的超标量微体系结构
现有超标量处理器的可扩展性差一直是计算机工程界非常关注的问题。特别是,在现有实现中,许多组件的关键路径长度增长为/spl Theta/(n/sup 2/),其中n是获取宽度,问题宽度或窗口大小。本文提出了一种新的实现,称为超标量处理器,它极大地减少了超标量处理器的渐近关键路径长度。处理器由大量带有控制器的alu(统称为执行站)通过并行前缀树电路网络连接在一起实现。胖树网络将一个交错缓存连接到执行站。这些网络提供了超标量处理器的全部功能,包括重命名、乱序执行和推测执行。由于门延迟,Ultrascalar的关键路径长度为/spl tau//sub gates/=/spl Theta/(log n)。线延迟和芯片尺寸取决于所提供的内存带宽和布局。如果提供的内存带宽是M(n)个每个时钟周期的内存操作,那么,使用h树VLSI布局,由于线延迟(光速延迟)的关键路径长度为/spl tau//sub wires/={/spl Theta/(n/sup 1/2/)如果M(n)为0 (n/sup 1/2-/spl epsiv//)对于/spl epsiv/>,[最优];{/spl Theta/(n/sup 1/2/log n)如果M(n)为/spl Theta/(n/sup 1/2/),则[接近最优];和{/spl Theta/(M(n))如果M(n)是/spl Omega/(n/sup 1/2+/spl epsiv//)对于/spl epsiv/>,[最优](M适当约束)。面积是线延迟的平方。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信