{"title":"Architectures for one billion of transistors","authors":"M. Valero","doi":"10.1145/501790.501805","DOIUrl":null,"url":null,"abstract":"Transistor budgets have been increasing at a very fast pace in the last years. This increasing transistor density will lead next generation processors to have a billion transistors available. It is the task of the computer architect to find the best way to use them.Out of order superscalar processors exploit parallelism at the finer grain, exploiting Instruction Level Parallelism (ILP). They issue multiple instructions per cycle, often in an order other than the specified by the programmer, using branch prediction and other speculative execution techniques in order to increase the available parallelism.Very Long Instruction Word (VLIW) processors also exploit parallelism at the instruction level, but they mostly rely on the compiler to detect the available parallelism. This increased compiler role allows a simpler design, and can be run at a faster clock rate, compensating for the loss of ILP.Chip Multiprocessors (CMP) join several narrow superscalar/VLIW components into a single processor, and mostly rely on the Thread Level Parallelism (TLP) for performance. The small and simple components can also run at a faster clock rate compensating for the loss of ILP, and significantly increasing throughput.Simultaneous Multithreaded (SMT) processors are based on wide superscalars, and exploit both ILP and TLP by issuing instructions from several different threads to the same pipeline, obtaining the benefits of TLP without sacrificing the ILP on single-threaded applications.For each of these options we will need to find the best balance between performance, design complexity, and power consumption among other factors. Also, the frontiers between them are not clear, and many intermediate design points can be found which leads to better/simpler/cheaper processors for the next generation of high performance computers.","PeriodicalId":118601,"journal":{"name":"International Symposium on Systems Synthesis","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Systems Synthesis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/501790.501805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Transistor budgets have been increasing at a very fast pace in the last years. This increasing transistor density will lead next generation processors to have a billion transistors available. It is the task of the computer architect to find the best way to use them.Out of order superscalar processors exploit parallelism at the finer grain, exploiting Instruction Level Parallelism (ILP). They issue multiple instructions per cycle, often in an order other than the specified by the programmer, using branch prediction and other speculative execution techniques in order to increase the available parallelism.Very Long Instruction Word (VLIW) processors also exploit parallelism at the instruction level, but they mostly rely on the compiler to detect the available parallelism. This increased compiler role allows a simpler design, and can be run at a faster clock rate, compensating for the loss of ILP.Chip Multiprocessors (CMP) join several narrow superscalar/VLIW components into a single processor, and mostly rely on the Thread Level Parallelism (TLP) for performance. The small and simple components can also run at a faster clock rate compensating for the loss of ILP, and significantly increasing throughput.Simultaneous Multithreaded (SMT) processors are based on wide superscalars, and exploit both ILP and TLP by issuing instructions from several different threads to the same pipeline, obtaining the benefits of TLP without sacrificing the ILP on single-threaded applications.For each of these options we will need to find the best balance between performance, design complexity, and power consumption among other factors. Also, the frontiers between them are not clear, and many intermediate design points can be found which leads to better/simpler/cheaper processors for the next generation of high performance computers.