{"title":"Tree-Mesh Heterogeneous Topology for Low-Latency NoC","authors":"Sung-Wook Han, Jinho Lee, Kiyoung Choi","doi":"10.1145/2685342.2685346","DOIUrl":"https://doi.org/10.1145/2685342.2685346","url":null,"abstract":"In Network-on-Chip (NoC), topology is one of the most important design choices that determine performance and power consumption. Mesh, being the most popular NoC topology for many researches and products, is mainly tailored towards high throughput. However, many researches show that NoCs rarely operate under heavy load and that latency is often much more critical in practice. In this paper, we show that by adding a small tree network to assist the baseline mesh network, the zero-load latency can be greatly reduced while still maintaining the high throughput. For the management of the hybrid network, we propose a novel algorithm to steer each packet to different networks based on hop-count gain and contention monitoring. Experimental results show improvement on not only synthetic traffic but also real application workloads.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"15 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129533451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sven Alexander Horsinka, Rolf Meyer, J. Wagner, R. Buchty, Mladen Berekovic
{"title":"On RTL to TLM Abstraction to Benefit Simulation Performance and Modeling Productivity in NoC Design Exploration","authors":"Sven Alexander Horsinka, Rolf Meyer, J. Wagner, R. Buchty, Mladen Berekovic","doi":"10.1145/2685342.2685349","DOIUrl":"https://doi.org/10.1145/2685342.2685349","url":null,"abstract":"Growing demand to integrate more functionality into single-chip solutions require novel network-based interconnection models. The resulting increase in design complexity and strict time-to-market restrictions endanger the viability of Register Transfer Level (RTL) centric design processes in the future. To counteract these developments, the abstract design methodologies presented by Transaction Level Modeling (TLM 2.0/SystemC) are gaining popularity. With this paper, we demonstrate the benefits of raising the abstraction level by creating an adjustable Network on Chip (NoC) simulation model, satisfying the diverse needs of software and system engineers. Based on a proven and tested RTL NoC design, we applied modeling methods defined in the TLM 2.0 standard, creating flexible simulation model. It provides high timing accuracy, enabling precise behavioral and performance analysis. In addition, higher simulation speeds are achieved by adjusting the timing accuracy. The results demonstrate the advantages of variable simulation accuracy: simulation runs are accelerated by more than two orders of magnitude with performance and behavior assessment exposing a limited latency error of less than four clock cycles compared to the RTL model.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Connection-Then-Credit Flow Control Protocol for Networks-On-Chips: Implementation Trade-offs","authors":"M. Sallam, M. El-Kharashi, M. Dessouky","doi":"10.1145/2685342.2685348","DOIUrl":"https://doi.org/10.1145/2685342.2685348","url":null,"abstract":"The Connection-Then-Credit (CTC) end-to-end flow control protocol is an extension to the normal Credit-Based (CB) flow control. CTC was introduced to address the message dependent deadlock problem in best-effort Networks-On-Chips (NoC) while offering an area-efficient network interface with respect to the normal CB end-to-end flow control protocol, which needs a lot of buffering resources. Nevertheless, only simulation results of the CTC versus CB were presented. In this paper, we introduce an implementation of both protocols; their RTL design is presented and synthesized in TSMC 40nm CMOS technology. Post-synthesis implementation results are analyzed and compared. The CTC and CB interfaces performance were evaluated and compared using standard traffic patterns and the theoretical equations of the protocols are validated through the implementation of a complete NoC, including network interfaces, routers, and mesochronous links in mesh topology.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128323454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Link Bandwidth Aware Backtracking Based Dynamic Task Mapping in NoC based MPSoCs","authors":"Changlin Chen, S. Cotofana","doi":"10.1145/2685342.2685343","DOIUrl":"https://doi.org/10.1145/2685342.2685343","url":null,"abstract":"In Network-on-Chip (NoC) based Multi-Processor Systems-on-Chip (MPSoCs) links, when affected by various dependability factors, may experience bandwidth reduction, which could result in substantial performance penalties if not properly considered within the application mapping process. In this paper, we propose a run-time task mapping algorithm, which takes both the path traffic load and link bandwidth into the consideration and maps applications onto contiguous near convex regions to reduce the internal and external congestion. We rely on backtracking strategy to guaranty that the maximum link traffic load does not exceed a given limit determined by the link bandwidth and a loose factor. To evaluate our proposal we map synthetic (TGFF tool generated) and real video processing applications on partially defective 8×8 NoCs. The experiments indicate that our approach substantially outperforms equivalent state of the art task mapping heuristics when NoC defects are present, e.g., for 5% broken wires, we achieve at least 16% communication cost reduction and 45% shorter average packet transmission latency.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"60 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114043544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Dinechin, Y. Durand, D. V. Amstel, Alexandre Ghiti
{"title":"Guaranteed Services of the NoC of a Manycore Processor","authors":"B. Dinechin, Y. Durand, D. V. Amstel, Alexandre Ghiti","doi":"10.1145/2685342.2685344","DOIUrl":"https://doi.org/10.1145/2685342.2685344","url":null,"abstract":"The Kalray MPPA®-256 processor (Multi-Purpose Processing Array) integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These cores are distributed across 16 compute clusters and 4 I/O subsystems. On-chip communications and synchronization are supported by an explicitly routed dual data & control network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem, for a total of 32 nodes. The data NoC is dedicated to streaming data transfers and may operate with guaranteed services, thanks to non-blocking routers and flow regulation at the source node. Its architecture has been designed so that (σ, ρ) network calculus applies with minimal approximations.\u0000 Given a set of flows across this data NoC with predetermined routes, we formulate the problem of guaranteeing fair allocation of bandwidth across flows and we present bounds on the maximum transfer latency. By considering the architecture of the data NoC and by introducing conservative approximations, we show how this formulation can be transformed into a linear program. Solving this linear program is efficient and the quality of its solutions appears comparable to those of the original formulation, based on problem instances obtained from the cyclostatic dataflow compilation toolchain of the Kalray MPPA®-256 processor.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115405506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farzad Fatollahi-Fard, D. Donofrio, George Michelogiannakis, J. Shalf
{"title":"OpenSoC Fabric: On-Chip Network Generator: Using Chisel to Generate a Parameterizable On-Chip Interconnect Fabric","authors":"Farzad Fatollahi-Fard, D. Donofrio, George Michelogiannakis, J. Shalf","doi":"10.1145/2685342.2685351","DOIUrl":"https://doi.org/10.1145/2685342.2685351","url":null,"abstract":"Recent advancements in technology scaling have sparked a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation suffers from long execution times or reduced accuracy in such complex systems, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, a parameterizable and powerful on-chip network generator for evaluating future large-scape chip multiprocessors and SoCs. OpenSoC Fabric leverages a new hardware DSL, Chisel, which contains powerful abstractions provided by its base language, Scala, and generates both software (C++) and hardware (Verilog) models from a single code base. This is in contrast to other tools readily available which typically provide either software or hardware models, but not both. The OpenSoC Fabric infrastructure is modeled after existing state-of-the-art simulators, offers large and powerful collections of configuration options, is open-source, and uses object-oriented design and functional programming to make functionality extension as easy as possible.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126964749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy Efficient Load Balancing Selection Strategy for Adaptive NoC Routers","authors":"John Jose, Bivil M. Jacob, Hashim P. Kamal","doi":"10.1145/2685342.2685350","DOIUrl":"https://doi.org/10.1145/2685342.2685350","url":null,"abstract":"Modern chip multi core systems are using Network on Chip (NoC) as the communication infrastructure. Effective output channel selection techniques are used in adaptive routers, which form the back bone of NoC systems to reduce the average packet latency of inter-core communications in multi-core systems. We propose a selection strategy, Cool Centers, for output port selection that ensures load balancing on a mesh NoC system. Cool centers reduces the possibility of traffic hot-spot formation in the network and can be applied on any minimal adaptive routing algorithm for improving the system performance. The proposed system equally distributes the traffic load among the available minimal paths without any significant architectural overhead. This reduces the rate of non-uniform wear and tear of routers and links, and prevent early aging of chips.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124421137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Abadal, Albert Mestres, M. Iannazzo, J. Solé-Pareta, E. Alarcón, A. Cabellos-Aparicio
{"title":"Evaluating the Feasibility of Wireless Networks-on-Chip Enabled by Graphene","authors":"S. Abadal, Albert Mestres, M. Iannazzo, J. Solé-Pareta, E. Alarcón, A. Cabellos-Aparicio","doi":"10.1145/2685342.2685345","DOIUrl":"https://doi.org/10.1145/2685342.2685345","url":null,"abstract":"Network-on-Chip (NoC) is currently the paradigm of choice for covering the on-chip communication needs of multicore processors. As we reach the manycore era, though, electrical interconnects present performance and power issues that are exacerbated in the presence of multicast communications due to the point-to-point nature of NoCs. This dramatically limits the available design space in terms of manycore architecture, sparking the need for new solutions. In this direction, the use of wireless interconnects has been recently proposed as a complement of a wired plane. In this paper, the concept of Graphene-enabled Wireless Network-on-Chip (GWNoC) is introduced, which extends the native broadcast capabilities of existing wireless NoCs by enabling the per-core integration of antennas that radiate in the terahertz band (0.1 - 10 THz). Preliminary results on the feasibility of GWNoC are presented, covering implementation, on-chip networking and multiprocessor architecture aspects.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134212394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Partitioning Algorithm for Optimizing Neuron-to-Neuron Pathways through NoC in BMI","authors":"Jim Ng, T. Mak","doi":"10.1145/2685342.2685347","DOIUrl":"https://doi.org/10.1145/2685342.2685347","url":null,"abstract":"To study the complex interactions between neurons in a large-scale neural network and perform neural rehabilitation to restore the function of a damaged neural organ, an efficient interface and an underlying processing unit is to be developed to cope with the high demand of massive realtime signal processing. The combination of Micro-Electrode Array(MEA) and Network-on-Chip(NoC) makes it possible to build a powerful monitoring, signal relaying and stimulation simulation system. This Brain Machine Interface (BMI) system is able to capture, relay and response to neural signal in a biologically realistic way. To achieve this goal, the traffic in the NoC is managed in an efficient way to minimize the packet delay. Moreover, to raise the scalability of the system given the time delay constraint, a novel partitioning algorithm is presented to minimize the traffic generated. Existing partitioning algorithms can be used to archive this aim, but they are inefficient when applied to this novel scenario. The proposed partitioning algorithm is designed specifically for this scenario and thus is able to reduce the traffic generated in the NoC by 25% on average. The power consumption is also reduced significantly.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126686402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User satisfaction aware routing decisions in NOC","authors":"Swamy D. Ponpandi, A. Tyagi","doi":"10.1145/2536522.2536531","DOIUrl":"https://doi.org/10.1145/2536522.2536531","url":null,"abstract":"In mobile devices, user satisfaction with the UI interactions ought to be the primary design driver. Some recent research has integrated a saturating, non-linear user satisfaction function in thread scheduler. Mobile embedded systems are moving towards large systems on chip (SoCs) with a Network on Chip (NoC). The inter-thread communication in such a system is hosted by the NoC as a flow. The application and operating system level user satisfaction research assumes that the throughput of inter-thread edges is limited only by the computational constraints of the nodes. With NoC, however, NoC resource allocation policies play an important role in the application level user satisfaction. In this paper, we filter down the user satisfaction from an application level attribute to the routers to improve the QoS at the routing level in order to leverage the user satisfaction at the application and system level. We demonstrate that this technique improves the user satisfaction of MP3 application by 10% while maintaining the QoS guarantee of MPEG-2 application.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}