High-radix on-chip networks with low-radix routers

2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2014-11-03 DOI:10.1109/ICCAD.2014.7001365

Animesh Jain, Ritesh Parikh, V. Bertacco

{"title":"High-radix on-chip networks with low-radix routers","authors":"Animesh Jain, Ritesh Parikh, V. Bertacco","doi":"10.1109/ICCAD.2014.7001365","DOIUrl":null,"url":null,"abstract":"Networks-on-chip (NoCs) have become increasingly widespread in recent years due to the extensive integration of many components in modern multicore processors and SoC designs. One of the fundamental tradeoffs in NoC design is the radix of its constituent routers. While high-radix routers enable a richly connected and low diameter network, low-radix routers allow for a small silicon area. Since the NoC consumes a significant portion of the on-chip resources, naïvely deploying an expensive high-radix network is not a practical option. In this work, we present a novel solution to provide high-radix like performance at a cost similar to that of a low-radix network. Our solution leverages the irregularity in runtime communication patterns to provide short low-latency paths between frequently communicating nodes, while infrequently communicating pairs rely on longer paths. To this end, it leverages a flexible topology reconfiguration infrastructure with abundantly available links between routers (in accordance to a high-radix topology) that are decoupled from scarcely available router ports (similar to a low-radix topology). Network links are bound to router ports at runtime to form connected and deadlock-free topologies. Binding selections are based on the traffic patterns observed, which are synthesized through a distributed statistics-collection framework. Our experiments on a 64-node CMP, running multiprogrammed workloads, show that we can reduce average network latency by 19% over an area- and power- comparable mesh NoC. The latency improvements for non-uniform synthetic traffic are above 30%.","PeriodicalId":426584,"journal":{"name":"2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2014.7001365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Networks-on-chip (NoCs) have become increasingly widespread in recent years due to the extensive integration of many components in modern multicore processors and SoC designs. One of the fundamental tradeoffs in NoC design is the radix of its constituent routers. While high-radix routers enable a richly connected and low diameter network, low-radix routers allow for a small silicon area. Since the NoC consumes a significant portion of the on-chip resources, naïvely deploying an expensive high-radix network is not a practical option. In this work, we present a novel solution to provide high-radix like performance at a cost similar to that of a low-radix network. Our solution leverages the irregularity in runtime communication patterns to provide short low-latency paths between frequently communicating nodes, while infrequently communicating pairs rely on longer paths. To this end, it leverages a flexible topology reconfiguration infrastructure with abundantly available links between routers (in accordance to a high-radix topology) that are decoupled from scarcely available router ports (similar to a low-radix topology). Network links are bound to router ports at runtime to form connected and deadlock-free topologies. Binding selections are based on the traffic patterns observed, which are synthesized through a distributed statistics-collection framework. Our experiments on a 64-node CMP, running multiprogrammed workloads, show that we can reduce average network latency by 19% over an area- and power- comparable mesh NoC. The latency improvements for non-uniform synthetic traffic are above 30%.

查看原文本刊更多论文

带有低基数路由器的高基数片上网络

近年来，由于现代多核处理器和SoC设计中许多组件的广泛集成，片上网络(noc)变得越来越普遍。NoC设计中的一个基本权衡是其组成路由器的基数。虽然高基数路由器可以实现丰富的连接和低直径的网络，但低基数路由器允许使用小硅区域。由于NoC消耗了很大一部分片上资源，因此naïvely部署昂贵的高基数网络并不是一个实际的选择。在这项工作中，我们提出了一种新的解决方案，以类似于低基数网络的成本提供高基数的性能。我们的解决方案利用运行时通信模式中的不规则性，在频繁通信的节点之间提供短的低延迟路径，而不频繁通信的对依赖于较长的路径。为此，它利用灵活的拓扑重新配置基础设施，在路由器之间提供大量可用的链路(根据高基数拓扑)，这些链路与几乎不可用的路由器端口(类似于低基数拓扑)解耦。网络链路在运行时绑定到路由器端口，形成连接和无死锁的拓扑结构。绑定选择基于观察到的流量模式，这些模式是通过分布式统计收集框架合成的。我们在运行多程序工作负载的64节点CMP上进行的实验表明，在面积和功率相当的网格NoC上，我们可以将平均网络延迟减少19%。非均匀合成流量的延迟改善在30%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

自引率

0.00%

发文量