Anjana Balachandran, Nandeesha Veeranna, Benjamin Carrión Schäfer
{"title":"On Time Redundancy of Fault Tolerant C-Based MPSoCs","authors":"Anjana Balachandran, Nandeesha Veeranna, Benjamin Carrión Schäfer","doi":"10.1109/ISVLSI.2016.99","DOIUrl":null,"url":null,"abstract":"Most prior work on hardware reliability make use of module (spatial) redundancy or time redundancy. In the first case, these methods assume that each module is exactly the same. Multiple module replicas implementing the same logic function are executed in different hardware channels and a voting scheme detects if the outputs match or not. In the second case, they re-compute the result using the same hardware channel. These previous works mainly applies at the RT-level. In this work we investigate the use of time redundancy to increase the reliability of C-Based MPSoCs. The method presented in this work leverages the latest system-level design capabilities of commercial HLS tools that allow the design, simulation and verification of complete SoCs at the behavioral level. Our proposed method builds complete MPSoCs at the behavioral level, which contain a variety of loosely coupled Hardware Accelerators (HWAccs) mapped as slaves onto a memory mapped shared bus. Inactive time at each HWAcc, mainly due to read and write overheads between the masters and slaves and bus congestion problems, is then used to recompute the output twice or thrice. This allows to detect if a transient fault has occurred or even fully mask the fault for the case that the results is re-computed three times. Although the proposed method cannot guarantee complete fault tolerance, experimental results show that especially for larger MPSoCs it can in most of the cases at least recompute the output twice and thus detect if a fault has occurred.","PeriodicalId":140647,"journal":{"name":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2016.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Most prior work on hardware reliability make use of module (spatial) redundancy or time redundancy. In the first case, these methods assume that each module is exactly the same. Multiple module replicas implementing the same logic function are executed in different hardware channels and a voting scheme detects if the outputs match or not. In the second case, they re-compute the result using the same hardware channel. These previous works mainly applies at the RT-level. In this work we investigate the use of time redundancy to increase the reliability of C-Based MPSoCs. The method presented in this work leverages the latest system-level design capabilities of commercial HLS tools that allow the design, simulation and verification of complete SoCs at the behavioral level. Our proposed method builds complete MPSoCs at the behavioral level, which contain a variety of loosely coupled Hardware Accelerators (HWAccs) mapped as slaves onto a memory mapped shared bus. Inactive time at each HWAcc, mainly due to read and write overheads between the masters and slaves and bus congestion problems, is then used to recompute the output twice or thrice. This allows to detect if a transient fault has occurred or even fully mask the fault for the case that the results is re-computed three times. Although the proposed method cannot guarantee complete fault tolerance, experimental results show that especially for larger MPSoCs it can in most of the cases at least recompute the output twice and thus detect if a fault has occurred.