{"title":"High-radix on-chip networks with low-radix routers","authors":"Animesh Jain, Ritesh Parikh, V. Bertacco","doi":"10.1109/ICCAD.2014.7001365","DOIUrl":null,"url":null,"abstract":"Networks-on-chip (NoCs) have become increasingly widespread in recent years due to the extensive integration of many components in modern multicore processors and SoC designs. One of the fundamental tradeoffs in NoC design is the radix of its constituent routers. While high-radix routers enable a richly connected and low diameter network, low-radix routers allow for a small silicon area. Since the NoC consumes a significant portion of the on-chip resources, naïvely deploying an expensive high-radix network is not a practical option. In this work, we present a novel solution to provide high-radix like performance at a cost similar to that of a low-radix network. Our solution leverages the irregularity in runtime communication patterns to provide short low-latency paths between frequently communicating nodes, while infrequently communicating pairs rely on longer paths. To this end, it leverages a flexible topology reconfiguration infrastructure with abundantly available links between routers (in accordance to a high-radix topology) that are decoupled from scarcely available router ports (similar to a low-radix topology). Network links are bound to router ports at runtime to form connected and deadlock-free topologies. Binding selections are based on the traffic patterns observed, which are synthesized through a distributed statistics-collection framework. Our experiments on a 64-node CMP, running multiprogrammed workloads, show that we can reduce average network latency by 19% over an area- and power- comparable mesh NoC. The latency improvements for non-uniform synthetic traffic are above 30%.","PeriodicalId":426584,"journal":{"name":"2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2014.7001365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Networks-on-chip (NoCs) have become increasingly widespread in recent years due to the extensive integration of many components in modern multicore processors and SoC designs. One of the fundamental tradeoffs in NoC design is the radix of its constituent routers. While high-radix routers enable a richly connected and low diameter network, low-radix routers allow for a small silicon area. Since the NoC consumes a significant portion of the on-chip resources, naïvely deploying an expensive high-radix network is not a practical option. In this work, we present a novel solution to provide high-radix like performance at a cost similar to that of a low-radix network. Our solution leverages the irregularity in runtime communication patterns to provide short low-latency paths between frequently communicating nodes, while infrequently communicating pairs rely on longer paths. To this end, it leverages a flexible topology reconfiguration infrastructure with abundantly available links between routers (in accordance to a high-radix topology) that are decoupled from scarcely available router ports (similar to a low-radix topology). Network links are bound to router ports at runtime to form connected and deadlock-free topologies. Binding selections are based on the traffic patterns observed, which are synthesized through a distributed statistics-collection framework. Our experiments on a 64-node CMP, running multiprogrammed workloads, show that we can reduce average network latency by 19% over an area- and power- comparable mesh NoC. The latency improvements for non-uniform synthetic traffic are above 30%.