{"title":"Fault-Tolerant 3D-NoC Architecture and Design: Recent Advances and Challenges","authors":"Li Jiang, Q. Xu","doi":"10.1145/2786572.2788709","DOIUrl":"https://doi.org/10.1145/2786572.2788709","url":null,"abstract":"In this paper, we survey recent research work in the design of fault-tolerant three-dimensional (3D) network-on-chip (NoC), which has drawn lots of research attention from both academia and industry. To be specific, we discuss the emerging defects introduced in 3D integration, the state-of-the-art fault-tolerant 3D router designs, various fault-tolerant routing algorithms in three-dimension, as well as the architecture and design methodologies to tolerate defective TSVs in 3D-NoC. Finally, we highlight open challenges and future research directions in this domain.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131864246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryota Yasudo, Hiroki Matsutani, M. Koibuchi, H. Amano, Tadao Nakamura
{"title":"On-Chip Decentralized Routers with Balanced Pipelines for Avoiding Interconnect Bottleneck","authors":"Ryota Yasudo, Hiroki Matsutani, M. Koibuchi, H. Amano, Tadao Nakamura","doi":"10.1145/2786572.2786583","DOIUrl":"https://doi.org/10.1145/2786572.2786583","url":null,"abstract":"Technology scaling makes designers face difficulties dealing with wire delay of long global interconnects, especially for high-radix networks. In this context, we propose decentralization of on-chip packet routers. A decentralized router consists of submodules, each of which has particular functionality and they are scattered on a link, thereby long wires are segmented. Our starting point is from a conventional router architecture, and we illustrate four case studies to generalize our proposal. We also propose a new buffer design and how to balance pipelines of a router. A proof-of-concept is shown in 28-nm process technology. Our results demonstrate that the decentralization of an on-chip router enables Link Traversal (LT) stages to be eliminated, and the critical path delay is improved by up to 45% with the reduced area compared with a conventional router. As technology advances, the benefit of the decentralized routers become more substantial in the nano-scale era.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115065975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Haghbayan, A. Kanduri, A. Rahmani, P. Liljeberg, A. Jantsch, H. Tenhunen
{"title":"MapPro: Proactive Runtime Mapping for Dynamic Workloads by Quantifying Ripple Effect of Applications on Networks-on-Chip","authors":"M. Haghbayan, A. Kanduri, A. Rahmani, P. Liljeberg, A. Jantsch, H. Tenhunen","doi":"10.1145/2786572.2786589","DOIUrl":"https://doi.org/10.1145/2786572.2786589","url":null,"abstract":"Increasing dynamic workloads running on NoC-based many-core systems necessitates efficient runtime mapping strategies. With an unpredictable nature of application profiles, selecting a rational region to map an incoming application is an NP-hard problem in view of minimizing congestion and maximizing performance. In this paper, we propose a proactive region selection strategy which prioritizes nodes that offer lower congestion and dispersion. Our proposed strategy, MapPro, quantitatively represents the propagated impact of spatial availability and dispersion on the network with every new mapped application. This allows us to identify a suitable region to accommodate an incoming application that results in minimal congestion and dispersion. We cluster the network into squares of different radii to suit applications of different sizes and proactively select a suitable square for a new application, eliminating the overhead caused with typical reactive mapping approaches. We evaluated our proposed strategy over different traffic patterns and observed gains of up to 41% in energy efficiency, 28% in congestion and 21% dispersion when compared to the state-of-the-art region selection methods.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"36 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mathematical Modeling and Control of Multifractal Workloads for Data-Center-on-a-Chip Optimization","authors":"P. Bogdan","doi":"10.1145/2786572.2786592","DOIUrl":"https://doi.org/10.1145/2786572.2786592","url":null,"abstract":"Building autonomous data-centers-on-chip (DCoC) for exascale computing requires mathematical frameworks that account and exploit the non-stationary and multi-fractal characteristics of computation and communication workloads. Towards this end, relying on DCoC (Intel's SCC) measurements, we propose a complex dynamical modeling approach that captures the observed multi-fractal characteristics of inter-event times between successive workload changes and the magnitude of the increments in DCoC workloads. Our novel mathematical framework allows for the analysis of higher order moments and enables the formulation of more accurate model predictive control strategies for multi-fractal dynamics. We investigate the impact of the multi-fractal spectrum richness on the performance of the control algorithm. Our mathematical formalism can further be used to model, analyze and solve DCoC design problems (e.g., topology reconfiguration, buffer sizing, mapping, scheduling, resource management, congestion control).","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"107 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117293117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Criticality in Network-On-Chip Design","authors":"Joshua San Miguel, Natalie D. Enright Jerger","doi":"10.1145/2786572.2786593","DOIUrl":"https://doi.org/10.1145/2786572.2786593","url":null,"abstract":"Many network-on-chip (NoC) designs focus on maximizing performance, delivering data to each core no later than needed by the application. Yet to achieve greater energy efficiency, we argue that it is just as important that data is delivered no earlier than needed. To address this, we explore data criticality in CMPs. Caches fetch data in bulk (blocks of multiple words). Depending on the application's memory access patterns, some words are needed right away (critical) while other data are fetched too soon (non-critical). On a wide range of applications, we perform a limit study of the impact of data criticality in NoC design. Criticality-oblivious designs can waste up to 37.5% energy, compared to an idealized NoC that fetches each word both no later and no earlier than needed. Furthermore, 62.3% of energy is wasted fetching data that is not used by the application. We present NoCNoC, a practical, criticality-aware NoC design that achieves up to 60.5% energy savings with no loss in performance. Our work moves towards an ideally-efficient NoC, delivering data both no later and no earlier than needed.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117257137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Kavyan Ziabari, José L. Abellán, Yenai Ma, A. Joshi, D. Kaeli
{"title":"Asymmetric NoC Architectures for GPU Systems","authors":"Amir Kavyan Ziabari, José L. Abellán, Yenai Ma, A. Joshi, D. Kaeli","doi":"10.1145/2786572.2786596","DOIUrl":"https://doi.org/10.1145/2786572.2786596","url":null,"abstract":"While both Chip MultiProcessors (CMPs) and Graphics Processing Units (GPUs) are many-core systems, they exhibit different memory access patterns. CMPs execute threads in parallel, where threads communicate and synchronize through the memory hierarchy (without any coalescing). GPUs on the other hand execute a large number of independent thread blocks and their accesses to memory are frequent and coalesced, resulting in a completely different access pattern. NoC designs for GPUs have not been extensively explored. In this paper, we first evaluate several NoC designs for GPUs to determine the most power/performance efficient NoCs. To improve NoC energy efficiency, we explore an asymmetric NoC design tailored for a GPU's memory access pattern, providing one network for L1-to-L2 communication and a second for L2-to-L1 traffic. Our analysis shows that an asymmetric multi-network Cmesh provides the most energy-efficient communication fabric for our target GPU system.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129746519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bogdan, Turbo Majumder, A. Ramanathan, Yuankun Xue
{"title":"NoC Architectures as Enablers of Biological Discovery for Personalized and Precision Medicine","authors":"P. Bogdan, Turbo Majumder, A. Ramanathan, Yuankun Xue","doi":"10.1145/2786572.2788706","DOIUrl":"https://doi.org/10.1145/2786572.2788706","url":null,"abstract":"This paper overviews the main computational issues in personalized and precision medicine (PPM), and present a cogent case for network-on-chip (NoC)-based multicore platforms as enablers in the process. We identify a series of challenges for the design and optimization of NoC-based solutions for PPM. To capture the characteristics of the cyber-physical sensing and processing, we propose a new computational model built on a dynamical heterogeneous hyper-graph description of application-to-architecture interactions. Starting from these premises, we summarize a few implications on NoC design methodologies, present some NoC-based solutions that deal with some of the challenges, and outline a few open problems.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127346688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerant Network-on-Chip based on Fault-aware Flits and Deflection Routing","authors":"Armin Runge","doi":"10.1145/2786572.2786585","DOIUrl":"https://doi.org/10.1145/2786572.2786585","url":null,"abstract":"Deflection routing is a promising approach for energy and hardware efficient NoCs. Future VLSI designs will have an increasing susceptibility to failures and breakdowns. The inherent redundancy of NoCs can be used to tolerate such failures. We extended the non-fault-tolerant CHIPPER router architecture to enable fault-tolerance. This architecture is based on deflection routing and utilizes a permutation network instead of a crossbar. The permutation network eliminates the sequential dependence of the priority based port allocation. Compared to a crossbar based design, a permutation network allows a faster and smaller router design. Simulations of an 8 × 8 network and more than 30.000 it injections show, that our router architecture is competitive with existing crossbar based fault-tolerant router architectures.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parka: Thermally Insulated Nanophotonic Interconnects","authors":"Y. Demir, N. Hardavellas","doi":"10.1145/2786572.2786597","DOIUrl":"https://doi.org/10.1145/2786572.2786597","url":null,"abstract":"Silicon-photonics are emerging as the prime candidate technology for energy-efficient on-chip interconnects at future process nodes. However, current designs are primarily based on microrings, which are highly sensitive to temperature. As a result, current silicon-photonic interconnect designs expend a significant amount of energy heating the microrings to a designated narrow temperature range, only to have the majority of the thermal energy waste away and dissipate through the heat sink, and in the process of doing so heat up the logic layer, causing significant performance degradation to the cores and inducing thermal emergencies. We propose Parka, a nanophotonic interconnect that encases the photonic die in a thermal insulator that keeps its temperature stable with low energy expenditure, while minimizing the spatial and temporal thermal coupling between logic and silicon-photonic components. Parka reduces the microring energy by 3.8--5.4x and achieves 11--23% speedup on average (34% max) depending on the cooling solution used.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124545428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fabrics on Die: Where Function, Debug and Test Meet","authors":"Priyadarsan Patra, C. Prudvi","doi":"10.1145/2786572.2788712","DOIUrl":"https://doi.org/10.1145/2786572.2788712","url":null,"abstract":"In this paper, we briefly present how packet-based networks or fabrics, have found their way into diverse usages on high-end industrial designs today. We outline the salient features, use models and challenges involved in implementation and application of these fabrics, not only in functional communication but also in power-management, silicon debug and high-volume-manufacturing test. Both debug and test hooks in SOC/NOC and some test/debug scenarios are discussed. We touch on some recent advances in functional networks and their implications to debug & test.","PeriodicalId":228605,"journal":{"name":"Proceedings of the 9th International Symposium on Networks-on-Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122215928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}