Architectures for one billion of transistors

International Symposium on Systems Synthesis Pub Date : 2000-09-20 DOI:10.1145/501790.501805

M. Valero

{"title":"Architectures for one billion of transistors","authors":"M. Valero","doi":"10.1145/501790.501805","DOIUrl":null,"url":null,"abstract":"Transistor budgets have been increasing at a very fast pace in the last years. This increasing transistor density will lead next generation processors to have a billion transistors available. It is the task of the computer architect to find the best way to use them.Out of order superscalar processors exploit parallelism at the finer grain, exploiting Instruction Level Parallelism (ILP). They issue multiple instructions per cycle, often in an order other than the specified by the programmer, using branch prediction and other speculative execution techniques in order to increase the available parallelism.Very Long Instruction Word (VLIW) processors also exploit parallelism at the instruction level, but they mostly rely on the compiler to detect the available parallelism. This increased compiler role allows a simpler design, and can be run at a faster clock rate, compensating for the loss of ILP.Chip Multiprocessors (CMP) join several narrow superscalar/VLIW components into a single processor, and mostly rely on the Thread Level Parallelism (TLP) for performance. The small and simple components can also run at a faster clock rate compensating for the loss of ILP, and significantly increasing throughput.Simultaneous Multithreaded (SMT) processors are based on wide superscalars, and exploit both ILP and TLP by issuing instructions from several different threads to the same pipeline, obtaining the benefits of TLP without sacrificing the ILP on single-threaded applications.For each of these options we will need to find the best balance between performance, design complexity, and power consumption among other factors. Also, the frontiers between them are not clear, and many intermediate design points can be found which leads to better/simpler/cheaper processors for the next generation of high performance computers.","PeriodicalId":118601,"journal":{"name":"International Symposium on Systems Synthesis","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Systems Synthesis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/501790.501805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Transistor budgets have been increasing at a very fast pace in the last years. This increasing transistor density will lead next generation processors to have a billion transistors available. It is the task of the computer architect to find the best way to use them.Out of order superscalar processors exploit parallelism at the finer grain, exploiting Instruction Level Parallelism (ILP). They issue multiple instructions per cycle, often in an order other than the specified by the programmer, using branch prediction and other speculative execution techniques in order to increase the available parallelism.Very Long Instruction Word (VLIW) processors also exploit parallelism at the instruction level, but they mostly rely on the compiler to detect the available parallelism. This increased compiler role allows a simpler design, and can be run at a faster clock rate, compensating for the loss of ILP.Chip Multiprocessors (CMP) join several narrow superscalar/VLIW components into a single processor, and mostly rely on the Thread Level Parallelism (TLP) for performance. The small and simple components can also run at a faster clock rate compensating for the loss of ILP, and significantly increasing throughput.Simultaneous Multithreaded (SMT) processors are based on wide superscalars, and exploit both ILP and TLP by issuing instructions from several different threads to the same pipeline, obtaining the benefits of TLP without sacrificing the ILP on single-threaded applications.For each of these options we will need to find the best balance between performance, design complexity, and power consumption among other factors. Also, the frontiers between them are not clear, and many intermediate design points can be found which leads to better/simpler/cheaper processors for the next generation of high performance computers.

查看原文本刊更多论文

十亿晶体管的架构

在过去的几年里，晶体管的预算一直在以非常快的速度增长。晶体管密度的增加将导致下一代处理器拥有10亿个晶体管。计算机架构师的任务是找到使用它们的最佳方法。无序超标量处理器利用更细粒度的并行性，利用指令级并行性(ILP)。它们每个周期发出多条指令，通常以程序员指定的顺序之外的顺序发出，使用分支预测和其他推测执行技术来增加可用的并行性。超长指令字(VLIW)处理器也利用指令级的并行性，但它们主要依赖于编译器来检测可用的并行性。这种增加的编译器角色允许更简单的设计，并且可以以更快的时钟速率运行，从而补偿ILP的损失。芯片多处理器(CMP)将多个窄标量/VLIW组件连接到单个处理器中，并且主要依赖于线程级并行(TLP)来实现性能。小而简单的组件还可以以更快的时钟速率运行，补偿ILP的损失，并显着提高吞吐量。同步多线程(SMT)处理器基于宽超标量，并通过从多个不同线程向同一管道发出指令来利用ILP和TLP，从而在不牺牲单线程应用程序的ILP的情况下获得TLP的好处。对于每一个选项，我们都需要在性能、设计复杂性和功耗等因素之间找到最佳平衡。此外，它们之间的边界并不明确，并且可以找到许多中间设计点，从而为下一代高性能计算机提供更好/更简单/更便宜的处理器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Symposium on Systems Synthesis

自引率

0.00%

发文量