{"title":"The Ultrascalar processor-an asymptotically scalable superscalar microarchitecture","authors":"Dana S. Henry, Bradley C. Kuszmaul, V. Viswanath","doi":"10.1109/ARVLSI.1999.756053","DOIUrl":null,"url":null,"abstract":"The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.","PeriodicalId":358015,"journal":{"name":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 20th Anniversary Conference on Advanced Research in VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARVLSI.1999.756053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular the critical-path lengths of many components in existing implementations grow as /spl Theta/(n/sup 2/) where n is the fetch width, the issue width, or the window size. This paper presents a novel implementation, called the Ultrascalar processor, that dramatically reduces the asymptotic critical-path length of a superscalar processor. The processor is implemented by a large collection of ALUs with controllers (together called execution stations) connected together by a network of parallel-prefix tree circuits. A fat-tree network connects an interleaved cache to the execution stations. These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. The Ultrascalar's critical-path length due to gate delays is /spl tau//sub gates/=/spl Theta/(log n). The wire delays and chip size depend on the provided memory bandwidth and the layout. If the provided memory bandwidth is M(n) memory operations per clock cycle then, using an H-tree VLSI layout, the critical-path length due to wire delay (speed-of-light delay) is /spl tau//sub wires/={/spl Theta/(n/sup 1/2/) if M(n) is O(n/sup 1/2-/spl epsiv//) for /spl epsiv/>0, [optimal]; {/spl Theta/(n/sup 1/2/log n) if M(n) is /spl Theta/(n/sup 1/2/), [near optimal]; and {/spl Theta/(M(n)) if M(n) is /spl Omega/(n/sup 1/2+/spl epsiv//) for /spl epsiv/>0, [optimal] (with M suitably constrained.) The area is the square of the wire delay.