Predict-More Router: A Low Latency NoC Router with More Route Predictions

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI:10.1109/IPDPSW.2013.40

Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

{"title":"Predict-More Router: A Low Latency NoC Router with More Route Predictions","authors":"Yuan He, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura","doi":"10.1109/IPDPSW.2013.40","DOIUrl":null,"url":null,"abstract":"Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achieving this goal is prediction router (PR). PR works by predicting the route an incoming packet may be transferred to and it speculatively allocates resources (virtual channels and the switch crossbar) to the packet and traverses the packet's flits using this predicted route in a single cycle without waiting for route computation; however, if prediction misses, the packet will then be processed in the conventional pipeline (in our work, four cycles) and the speculatively allocated router resources will be wasted. Obviously, prediction accuracy contributes to the amount of successful predictions, latency reduction and bandwidth consumption. We find that predictions hit around 65% for most applications even under the best algorithm so in such cases PR can at most accelerate about 65% of the packets while the left 35% will consume extra router resources and bandwidth. In order to increase the prediction accuracy, we propose a technique, which makes use of multiple prediction algorithms at the same time for one incoming packet. Such a prediction is more accurate. With this proposal, we design and implement predict-more router (PmR). While effectively increasing the prediction accuracy, PmR also helps utilizing remaining bandwidth within the router more productively. When both PmR and PR are evaluated under their best algorithm(s), we find that PmR is over 15% higher in prediction accuracy than PR, which helps PmR outperform PR by 3.5% on average in speeding-up the system. We also find that although PmR creates more contentions in prediction, these contentions can be well resolved and are kept within the router so both router internal bandwidth and link bandwidth are not exacerbated with it.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Network-on-Chip (NoC) is a critical part of the memory hierarchy of emerging multicores. Lowering its communication latency while preserving its bandwidth is key to achieving high system performance. By now, one of the most effective methods helps achieving this goal is prediction router (PR). PR works by predicting the route an incoming packet may be transferred to and it speculatively allocates resources (virtual channels and the switch crossbar) to the packet and traverses the packet's flits using this predicted route in a single cycle without waiting for route computation; however, if prediction misses, the packet will then be processed in the conventional pipeline (in our work, four cycles) and the speculatively allocated router resources will be wasted. Obviously, prediction accuracy contributes to the amount of successful predictions, latency reduction and bandwidth consumption. We find that predictions hit around 65% for most applications even under the best algorithm so in such cases PR can at most accelerate about 65% of the packets while the left 35% will consume extra router resources and bandwidth. In order to increase the prediction accuracy, we propose a technique, which makes use of multiple prediction algorithms at the same time for one incoming packet. Such a prediction is more accurate. With this proposal, we design and implement predict-more router (PmR). While effectively increasing the prediction accuracy, PmR also helps utilizing remaining bandwidth within the router more productively. When both PmR and PR are evaluated under their best algorithm(s), we find that PmR is over 15% higher in prediction accuracy than PR, which helps PmR outperform PR by 3.5% on average in speeding-up the system. We also find that although PmR creates more contentions in prediction, these contentions can be well resolved and are kept within the router so both router internal bandwidth and link bandwidth are not exacerbated with it.

查看原文本刊更多论文

预测-更多路由器:具有更多路由预测的低延迟NoC路由器

片上网络(NoC)是新兴多核存储器层次结构的重要组成部分。在保持带宽的同时降低通信延迟是实现高系统性能的关键。到目前为止，帮助实现这一目标的最有效方法之一是预测路由器(PR)。PR的工作原理是预测传入数据包可能传输到的路由，并推测地为数据包分配资源(虚拟通道和交换机交叉条)，并在一个周期内使用该预测路由遍历数据包的flits，而无需等待路由计算;然而，如果预测失败，那么数据包将在常规管道中进行处理(在我们的工作中，四个周期)，并且推测分配的路由器资源将被浪费。显然，预测准确性有助于成功预测的数量、延迟减少和带宽消耗。我们发现，即使在最好的算法下，大多数应用程序的预测也会达到65%左右，所以在这种情况下，PR最多可以加速约65%的数据包，而剩下的35%将消耗额外的路由器资源和带宽。为了提高预测精度，我们提出了一种对一个传入数据包同时使用多种预测算法的技术。这样的预测更为准确。在此基础上，我们设计并实现了多预测路由器(PmR)。在有效提高预测精度的同时，PmR还有助于更有效地利用路由器内的剩余带宽。当PmR和PR在各自的最佳算法下进行评估时，我们发现PmR的预测准确率比PR高出15%以上，这使得PmR在加速系统方面平均比PR高出3.5%。我们还发现，尽管PmR在预测中产生了更多的争用，但这些争用可以很好地解决并保存在路由器内部，因此路由器内部带宽和链路带宽都不会因此而加剧。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

自引率

0.00%

发文量