{"title":"Quest for high-performance bufferless NoCs with single-cycle express paths and self-learning throttling","authors":"Bhavya K. Daya, L. Peh, A. Chandrakasan","doi":"10.1145/2897937.2898075","DOIUrl":null,"url":null,"abstract":"Router buffers are the main reason for the Network-on-Chip's (NoC) scalable bandwidth, but consumes significant area and power. The SCEPTER bufferless NoC sets up single-cycle virtual express paths dynamically, allowing packets to traverse non-minimal paths without latency penalty. Using prioritization, bypassing, and throttling mechanisms, we maximize opportunities to use these paths while pushing bandwidth. For 64 and 256 nodes, we achieve 62% lower latency, 1.3× higher throughput, and 35% lower starvation over a baseline bufferless NoC for synthetic traffic. Full-system 36-core simulations show a 19% lower runtime, on-par performance to a buffered network, with 36% lower area, 33% lower power.","PeriodicalId":185271,"journal":{"name":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897937.2898075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Router buffers are the main reason for the Network-on-Chip's (NoC) scalable bandwidth, but consumes significant area and power. The SCEPTER bufferless NoC sets up single-cycle virtual express paths dynamically, allowing packets to traverse non-minimal paths without latency penalty. Using prioritization, bypassing, and throttling mechanisms, we maximize opportunities to use these paths while pushing bandwidth. For 64 and 256 nodes, we achieve 62% lower latency, 1.3× higher throughput, and 35% lower starvation over a baseline bufferless NoC for synthetic traffic. Full-system 36-core simulations show a 19% lower runtime, on-par performance to a buffered network, with 36% lower area, 33% lower power.