{"title":"Fault recovery communication protocol for NoC-based MPSoCs","authors":"E. Wächter, Alexandre M. Amory, F. Moraes","doi":"10.1109/ISVLSI.2013.6654648","DOIUrl":null,"url":null,"abstract":"Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with faults during fabrication or product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications even though the network has alternative fault-free paths to a given destination. This PhD work presents a fault-tolerant communication protocol that takes advantage of the NoC routing method to provide alternative paths between any source-target pair of processors. At the application layer, the method is seen as a typical MPI-like message passing protocol. At the lower layers, the method consists of a software kernel layer that monitors the regularity of message exchanges between pairs of tasks. If a message is not delivered in a certain time, the software fires the path finding mechanism, which guarantees complete network reachability. The proposed approach determines new paths quickly, and the costs of extra silicon area and memory usage are small.","PeriodicalId":439122,"journal":{"name":"2013 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2013.6654648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with faults during fabrication or product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications even though the network has alternative fault-free paths to a given destination. This PhD work presents a fault-tolerant communication protocol that takes advantage of the NoC routing method to provide alternative paths between any source-target pair of processors. At the application layer, the method is seen as a typical MPI-like message passing protocol. At the lower layers, the method consists of a software kernel layer that monitors the regularity of message exchanges between pairs of tasks. If a message is not delivered in a certain time, the software fires the path finding mechanism, which guarantees complete network reachability. The proposed approach determines new paths quickly, and the costs of extra silicon area and memory usage are small.