E. Gawish, M. El-Kharashi, Mohamed Fathy Abu Elyazeed
{"title":"Variability-tolerant NoC link design","authors":"E. Gawish, M. El-Kharashi, Mohamed Fathy Abu Elyazeed","doi":"10.1145/2401716.2401729","DOIUrl":"https://doi.org/10.1145/2401716.2401729","url":null,"abstract":"In this paper we propose a model for the design of Networks-on-Chip (NoC) links that takes into considerations the systematic and random effects of process variability. The model predicts the delay variations of each NoC link in a floor-plan. Delay variations are used to modify the link design parameters, like the optimal number of buffered sections and their gains, to meet the delay constraints in a more variability-tolerant way. The proposed technique is tested using test cases of 4x4 meshes at 65 nm, 45nm, 32nm, and 22 nm technologies. Results show that the delay variations approach 10% of the total link delay and the total power cost using our technique is up to 33% compared to the nominal delay and power values in the absence of random and systematic variations effects. Yet our methodology has a lower power cost compared to the worst-case design, saving up to 28% of the total power consumption in the test case of the 4x4 mesh at 45 nm.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129692309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ammar Karkar, Ra'ed Al-Dujaily, A. Yakovlev, K. Tong, T. Mak
{"title":"Surface wave communication system for on-chip and off-chip interconnects","authors":"Ammar Karkar, Ra'ed Al-Dujaily, A. Yakovlev, K. Tong, T. Mak","doi":"10.1145/2401716.2401720","DOIUrl":"https://doi.org/10.1145/2401716.2401720","url":null,"abstract":"Network-on-chip (NoC) is a communication paradigm that has emerged to tackle different on-chip challenges and satisfy different demands in terms of high performance and economical interconnect implementation. However, merely metal based interconnect pursuit offers limited scalability with the relentless technology scaling. To meet the scalability demand, this paper proposes a new hybrid interconnect fabric empowered by metal interconnect NoC and Zenneck surface Waves Interconnect (SWI) technology. Our initial results show a considerable power reduction (9 to 17%) and performance improvement (35%) of the proposed hybrid architecture compared to regular NoC. These results are achieved over relatively small hardware and area overhead (2.29% of die). This paper explores promising potentials of SWI for future System-on-Chip (SoC) global communication.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"B2RAC: a physical express link addition methodology for network on chip","authors":"Jiajia Jiao, Yuzhuo Fu","doi":"10.1145/2076501.2076505","DOIUrl":"https://doi.org/10.1145/2076501.2076505","url":null,"abstract":"As a compromise solution for Network on Chip (NoC) architecture design, adding some application-specified express links based on regular topology such as Mesh has been proved to exploit the benefits offered by both complete regularity and partial topology customization. Following this perspective, an enhanced link addition methodology B2RAC is proposed to automatically synthesize new NoC architecture for guiding effective design in this paper, including: i) flexible branch bound (B2) algorithm for best link set selection iteratively; ii) efficient routing-aware (RA) performance estimation model for each link addition procedure; iii) configurable(C) switches with fifos for the additional long link equivalence. The simulation results show the optimized architecture of B2RAC methodology can bring better performance (latency decreases by 16.5% and 23.46% for typical applications VOPD and MWD respectively) with good flexibility for real application traffic over up-to-date link addition policy.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"57 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113973742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yoshi Shih-Chieh Huang, Huang-Yu Liu, Yuan-Ying Chang, C. King, S. Tseng
{"title":"Floodgate: application-driven flow control in network-on-chip for many-core architectures","authors":"Yoshi Shih-Chieh Huang, Huang-Yu Liu, Yuan-Ying Chang, C. King, S. Tseng","doi":"10.1145/2076501.2076503","DOIUrl":"https://doi.org/10.1145/2076501.2076503","url":null,"abstract":"With the prevalence of multi- and many-core architecture, network-on-chip (NoC) is becoming the main paradigm for on-chip interconnection. However, the performance of NoCs can be degraded significantly if the network flow is not controlled properly. Most previous solutions have tried to detect network congestion by monitoring the hardware status of the network switches or links. Unfortunately, such strategies rely on the backpressure of the traffic flows for congestion detection and may be too slow to respond. This paper proposes a proactive strategy which predicts the global, end-to-end traffic patterns of the running application and takes preventive flow control actions to avoid congestions. The proposed system entails an application-level prediction table for accurate traffic prediction and a packet injection scheduler for congestion avoidance. The proposed scheme is evaluated by a trace-driven simulator with synthetic traffic traces as well as a real application trace of an instance in the SPLASH-2 benchmark. The results show the superior performance of the proposed scheme with negligible execution overhead.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124710569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selecting the optimal system: automated design of application-specific systems-on-chip","authors":"Oscar Almer, Miles Gould, Björn Franke, N. Topham","doi":"10.1145/2076501.2076510","DOIUrl":"https://doi.org/10.1145/2076501.2076510","url":null,"abstract":"Specialising Systems-on-Chip (SOCs) for a particular application is an effective way of increasing the performance achievable for a given level of energy consumption. In fact, silicon manufacture costs are low enough that small, custom, entirely digital designs, up to and including multi-core microprocessor designs, can be manufactured cheaply in short manufacturing runs. Non-recurring engineering (Nre) costs are still prohibitive due to the high level of experience required from the design engineer and the vast size of the design space. This is even true when only pre-verified Commercial Off-the-Shelf (Cots) Intellectual Property (ip) blocks are used in the SoC design. In this paper we present a novel machine-learning based method of generating an application-specific SoC design and configuration. This approach is fully automated and can generate near-optimal application-specific SoC designs within hours rather than weeks and, hence, reduce both Nre costs and time-to-market significantly. Our methodology profiles key application characteristics using simulation of a small number of test systems and machine-learning based prediction to find likely optimal system designs for a given target application. We demonstrate the effectiveness of our automated design methodology using 82 workload applications, generate SoC designs with up to 10 cores and 8 memory banks, and show that our classifier averages up to 92% of the optimal design performance across our applications.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127207224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic clustering for distinct parallel programming models on NoC-based MPSoCs","authors":"Gustavo Girão, Thiago Santini, F. Wagner","doi":"10.1145/2076501.2076514","DOIUrl":"https://doi.org/10.1145/2076501.2076514","url":null,"abstract":"This paper investigates the impact of dynamic clustering and the use of hardware support for distinct parallel programming models in an NoC-based MPSoC environment. Using a dynamically adaptable hardware, the platform provides clusters that implement either a shared memory organization or a distributed memory organization in order to meet applications' requirements without any computational overhead. The entire process is completely transparent for the programmer. In addition, a scheduler is used to take advantage of changes on the degree of parallelism of an application to improve workload balancing. Experimental results show that dynamic clustering can improve performance up to 77% (54% in average) and can provide energy savings up to 58% (42% in average).","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132730187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduardo Antunes, A. Aguiar, S. J. Filho, M. Sartori, Fabiano Hessel, C. Marcon
{"title":"Partitioning and mapping on NoC-Based MPSoC: an energy consumption saving approach","authors":"Eduardo Antunes, A. Aguiar, S. J. Filho, M. Sartori, Fabiano Hessel, C. Marcon","doi":"10.1145/2076501.2076512","DOIUrl":"https://doi.org/10.1145/2076501.2076512","url":null,"abstract":"Software complexity has increased considerably over recent years, needing special target architectures as MPSoCs to fulfill the heavy memory, communication and computation requirements. Nevertheless, the use of MPSoCs has brought attention to the need for effective methods and tools for parallel software development. Methodologies aggregating partitioning and mapping are normally employed to fulfill the heavy requirements of such systems. This paper explores task-partitioning and processor-mapping methods on homogeneous NoC-Based MPSoC. The effect of both on application's energy consumption is explored alone and jointly. Experiments with several synthetic and four real applications show that the energy consumption is reduced up to 18%, 31.8% or 38.1% when applying partitioning, mapping or both, respectively.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116986359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing the security of time-division-multiplexing networks-on-chip through the use of multipath routing","authors":"R. Stefan, K. Goossens","doi":"10.1145/2076501.2076513","DOIUrl":"https://doi.org/10.1145/2076501.2076513","url":null,"abstract":"After gaining popularity as a method of authentication in the form of smart cards, electronic security mechanisms are making their way into the domain of embedded domain with the goal of protecting Intellectual Property or for Digital Rights Management. A key role in implementing security at chip-level is played by the interconnect, which has the task of providing and regulating the flow of data between an increasing number of on-chip elements, not all of which can be considered trustworthy. Networks-on-Chip are emerging as a scalable solution for modern on-chip communication. In this study we aim to improve NoC security by forcing the messages to be routed on multiple disjoint paths, optionally in a non-deterministic manner. We implement our proposal and find it to have a reasonably low cost in terms of hardware area, although potentially having a larger overhead in terms of allocated bandwidth.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131401395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Tatenguem, D. Ludovici, Alessandro Strano, D. Bertozzi, H. Reinig
{"title":"Contrasting multi-synchronous MPSoC design styles for fine-grained clock domain partitioning: the full-HD video playback case study","authors":"H. Tatenguem, D. Ludovici, Alessandro Strano, D. Bertozzi, H. Reinig","doi":"10.1145/2076501.2076509","DOIUrl":"https://doi.org/10.1145/2076501.2076509","url":null,"abstract":"Fine-grained (per-core) multi-synchronous systems calls for new clocking strategies and new architecture design techniques. This paper compares two fundamental multi-synchronous implementation variants based on the extensive use of dual-clock FIFOs vs mesochronous synchronizers respectively. The architecture-homogeneous experimental setting, the cost-effective merging of synchronizers with NoC switch buffers, the sharing of as many physical synthesis steps as possible between the two architectures and the requirements of a realistic full-HD video playback application are the key innovations of this study.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132173374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Jose, J. Shankar, K. Mahathi, D. K. Kumar, M. Mutyam
{"title":"BOFAR: buffer occupancy factor based adaptive router for mesh NoCs","authors":"John Jose, J. Shankar, K. Mahathi, D. K. Kumar, M. Mutyam","doi":"10.1145/2076501.2076506","DOIUrl":"https://doi.org/10.1145/2076501.2076506","url":null,"abstract":"If the route computation operation in an adaptive router returns more than one output channels, the selection strategy chooses one from them based on the congestion metric used. The effectiveness of a selection strategy depends on what metric is used to identify congestion and how precisely that metric captures the actual congestion. The number of cycles a flit stays in a router is a direct indication of the contention level of the output port it desires to move out. We propose buffer Occupancy Factor based Adaptive Router (BOFAR), wherein the history of cycles spent by flits in buffers is used as the congestion metric. BOFAR outperforms the baseline architectures built on minimal odd-even adaptive router model with conventional selection strategies like count of free downstream virtual channels at reachable neighbors, and fluidity of buffers in downstream neighbors. Our experiments on 4x4 mesh NoC with various synthetic traffic patterns show that BOFAR exceeds the performance of best baseline adaptive router with 21% average and 78% maximum latency reduction at saturation load. The reduced average packet latency, increased buffer fluidity fairness, and increased saturation point of BOFAR with minimal overhead in area, power, and wiring makes it a promising alternative to existing adaptive routers in mesh NoCs.","PeriodicalId":344147,"journal":{"name":"Network on Chip Architectures","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}