{"title":"Simple virtual channel allocation for high throughput and high frequency on-chip routers","authors":"Yi Xu, Bo Zhao, Youtao Zhang, Jun Yang","doi":"10.1145/2742349","DOIUrl":null,"url":null,"abstract":"Technology scaling has led to the integration of many cores into a single chip. As a result, on-chip interconnection networks start to play a more and more important role in determining the performance and power of the entire chip. Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multi-core processors. However the virtual-channel (VC) buffers in the NoC consume significant dynamic and leakage power of the system. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. This paper proposes two new virtual channel allocation (VA) mechanisms, termed Fixed VC Assignment with Dynamic VC Allocation (FVADA) and Adjustable VC Assignment with Dynamic VC Allocation (AVADA). The idea is that VCs are assigned based on the designated output port of a packet to reduce the Head-of-Line (HoL) blocking. Also, the number of VCs allocated for each output port can be adjusted dynamically. Unlike previous buffer-pool based designs, we only use a small number of VCs to keep the arbitration latency low. Simulation results show that FVADA and AVADA can improve the network throughput by 41% on average, compared to a baseline design with the same buffer size. AVADA can still outperform the baseline even when our buffer size is halved. Moreover, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 60%. Our results prove that the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency on-chip network designs.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2742349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51
Abstract
Technology scaling has led to the integration of many cores into a single chip. As a result, on-chip interconnection networks start to play a more and more important role in determining the performance and power of the entire chip. Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multi-core processors. However the virtual-channel (VC) buffers in the NoC consume significant dynamic and leakage power of the system. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. This paper proposes two new virtual channel allocation (VA) mechanisms, termed Fixed VC Assignment with Dynamic VC Allocation (FVADA) and Adjustable VC Assignment with Dynamic VC Allocation (AVADA). The idea is that VCs are assigned based on the designated output port of a packet to reduce the Head-of-Line (HoL) blocking. Also, the number of VCs allocated for each output port can be adjusted dynamically. Unlike previous buffer-pool based designs, we only use a small number of VCs to keep the arbitration latency low. Simulation results show that FVADA and AVADA can improve the network throughput by 41% on average, compared to a baseline design with the same buffer size. AVADA can still outperform the baseline even when our buffer size is halved. Moreover, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 60%. Our results prove that the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency on-chip network designs.