Kai-Yang Hsieh, Bo-Chuan Cheng, Ruei-Ting Gu, Katherine Shu-Min Li
{"title":"Fault-tolerant mesh for 3D network on chip","authors":"Kai-Yang Hsieh, Bo-Chuan Cheng, Ruei-Ting Gu, Katherine Shu-Min Li","doi":"10.1109/IMPACT.2011.6117292","DOIUrl":null,"url":null,"abstract":"3D Mesh NoCs (Network on Chips) are one of the best approaches to solve the complexity of interconnect structures in SoCs (System on Chips) which leads to lower yield. In this paper, we present a Mesh-based scheme for 3D NoCs with fault-tolerance that helps increasing chips' reliability and yield. There are several phases for this scheme. The phase I transforms a 2D NoC into an optimized 3D NoC under the constraints of area, routing length, temperature, performance and etc. Then, we optimize the I/O placement to get the best routing between I/O pads and all cores by clustering the placement of each core and reassign the tier sequence to minimize the number of TSVs. Finally, we build up the Mesh topology for each tier with squaring the maximum number of cores. For example, we need a 4×4 Mesh if the maximum cores in each tier are 15. Once the 3D Mesh topology is ready, we are going to set up the routing scheme that provides the minimum number of routers and the minimum routing latency in phase II. We also have a routing scheme to control the data flow and distribute the communication overhead. Phase III is to search the replacement routing paths. There will be at least 2 paths for each connection. The more replacement paths we found, the more faults can be tolerated and more computing time will be needed. We verify the fault-tolerant 3D Mesh NoC in phase IV. First, we randomly insert some faults to verify if the NoC is still working. We can get the maximum number of faults to be tolerated by increasing the number of faults until the system crash in the second step. The verification may need hundreds of times to get the approximate maximum faults. If the fault toleration is not good enough, we can go back to phase III to search more replacements. Experimental results show to this verified fault-tolerant 3D Mesh scheme to be effective and efficient. This scheme can efficiently transform a complex 2D NoC into 3D fault-tolerant Mesh NoC according to the user-defined constraints and also provides the tradeoff analysis between the tolerance and the search time of the effective replacement paths.","PeriodicalId":6360,"journal":{"name":"2011 6th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)","volume":"1 1","pages":"202-205"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 6th International Microsystems, Packaging, Assembly and Circuits Technology Conference (IMPACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMPACT.2011.6117292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
3D Mesh NoCs (Network on Chips) are one of the best approaches to solve the complexity of interconnect structures in SoCs (System on Chips) which leads to lower yield. In this paper, we present a Mesh-based scheme for 3D NoCs with fault-tolerance that helps increasing chips' reliability and yield. There are several phases for this scheme. The phase I transforms a 2D NoC into an optimized 3D NoC under the constraints of area, routing length, temperature, performance and etc. Then, we optimize the I/O placement to get the best routing between I/O pads and all cores by clustering the placement of each core and reassign the tier sequence to minimize the number of TSVs. Finally, we build up the Mesh topology for each tier with squaring the maximum number of cores. For example, we need a 4×4 Mesh if the maximum cores in each tier are 15. Once the 3D Mesh topology is ready, we are going to set up the routing scheme that provides the minimum number of routers and the minimum routing latency in phase II. We also have a routing scheme to control the data flow and distribute the communication overhead. Phase III is to search the replacement routing paths. There will be at least 2 paths for each connection. The more replacement paths we found, the more faults can be tolerated and more computing time will be needed. We verify the fault-tolerant 3D Mesh NoC in phase IV. First, we randomly insert some faults to verify if the NoC is still working. We can get the maximum number of faults to be tolerated by increasing the number of faults until the system crash in the second step. The verification may need hundreds of times to get the approximate maximum faults. If the fault toleration is not good enough, we can go back to phase III to search more replacements. Experimental results show to this verified fault-tolerant 3D Mesh scheme to be effective and efficient. This scheme can efficiently transform a complex 2D NoC into 3D fault-tolerant Mesh NoC according to the user-defined constraints and also provides the tradeoff analysis between the tolerance and the search time of the effective replacement paths.