A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin
{"title":"通过辐射后设计分析提高 FPGA SoC 的容错能力","authors":"A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin","doi":"10.1145/3674841","DOIUrl":null,"url":null,"abstract":"\n FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\n \n \\(\\times\\)\n \n in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\n \n \\(\\times\\)\n \n and thus provides a 22.8X\n \n \\(\\times\\)\n \n MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.\n","PeriodicalId":505501,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"103 22","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Fault Tolerance for FPGA SoCs Through Post Radiation Design Analysis\",\"authors\":\"A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin\",\"doi\":\"10.1145/3674841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\\n \\n \\\\(\\\\times\\\\)\\n \\n in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\\n \\n \\\\(\\\\times\\\\)\\n \\n and thus provides a 22.8X\\n \\n \\\\(\\\\times\\\\)\\n \\n MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.\\n\",\"PeriodicalId\":505501,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"volume\":\"103 22\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3674841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3674841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Fault Tolerance for FPGA SoCs Through Post Radiation Design Analysis
FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14
\(\times\)
in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54
\(\times\)
and thus provides a 22.8X
\(\times\)
MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.