{"title":"Stream architectures - efficiency and programmability","authors":"M. Erez","doi":"10.1109/ISSOC.2004.1411141","DOIUrl":null,"url":null,"abstract":"Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing \"stream scheduling\" to stage the high-level movement of streams, and \"communication scheduling\" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.","PeriodicalId":268122,"journal":{"name":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on System-on-Chip, 2004. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSOC.2004.1411141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Summary form only given. Stream processors are fully programmable in a high-level language, yet are capable of achieving computation efficiency comparable to fixed-function ASIC solutions (about 20 pJ/op) and can be scaled from a Gop/s (20 mW) block to a Top/s (20 W) chip in current semiconductor technology. The parallel nature of stream processors enables their performance to scale with technology. In a 2010 45 nm technology we expect an efficiency of 1 pJ/op and performance of up to 20 Top/s (20 W). A stream processor contains an array of arithmetic units that are supplied with data by a deep and explicit register hierarchy, which also serves to decouple instruction execution from unpredictable and long-latency memory operations. This decoupled and exposed-communication architecture enables a compiler to automatically map a stream application (such as a signal-flow graph) to the processing array: employing "stream scheduling" to stage the high-level movement of streams, and "communication scheduling" to schedule the data movement in the low-level kernels. This explicit optimization of communication results in almost all data and instruction movement taking place over short wires, and hence almost all energy going to useful computation. We have built a prototype streaming signal processor, Imagine, and have demonstrated streaming applications involving video compression/decompression, wireless communication, and adaptive beam-forming. We are also designing the Merrimac supercomputer, which uses a stream processor based on the same architectural principles as Imagine, illustrating the flexibility, generality, and scalability of the streaming concept. This paper describes stream architectures, stream programming systems, and streaming applications. A comparison is made to conventional DSPs, FPGAs, and ASIC solutions.