通过辐射后设计分析提高 FPGA SoC 的容错能力

A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin
{"title":"通过辐射后设计分析提高 FPGA SoC 的容错能力","authors":"A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin","doi":"10.1145/3674841","DOIUrl":null,"url":null,"abstract":"\n FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\n \n \\(\\times\\)\n \n in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\n \n \\(\\times\\)\n \n and thus provides a 22.8X\n \n \\(\\times\\)\n \n MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.\n","PeriodicalId":505501,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":"103 22","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Fault Tolerance for FPGA SoCs Through Post Radiation Design Analysis\",\"authors\":\"A. E. Wilson, Nathan Baker, Ethan Campbell, Michael Wirthlin\",\"doi\":\"10.1145/3674841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\\n \\n \\\\(\\\\times\\\\)\\n \\n in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\\n \\n \\\\(\\\\times\\\\)\\n \\n and thus provides a 22.8X\\n \\n \\\\(\\\\times\\\\)\\n \\n MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.\\n\",\"PeriodicalId\":505501,\"journal\":{\"name\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"volume\":\"103 22\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Reconfigurable Technology and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3674841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3674841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

通过采用配置擦除、三重模块冗余、纠错编码和辐射感知实施技术等单点故障(SEU)缓解技术,FPGA 已经证明能够在恶劣的辐射环境中可靠运行。然而,当使用复杂的系统级设计,并采用具有单点故障的复杂 I/O 接口时,这些技术的有效性就会受到限制。在以前的工作中,一个运行 Linux 的复杂 SoC 系统应用了其中几种技术,但平均故障时间(MTTF)仅提高了 14 \(\times/\)。详细的辐射后故障分析发现,可靠性方面的限制是由 DDR 接口、全局时钟网络和互连造成的。本文采用了一系列针对特定设计的 SEU 缓解技术,以解决该设计在可靠性方面的局限性。这些变化包括将全局时钟复制三倍,优化还原输出投票器和输入触发器的位置,以及采用一种称为 "条带化 "的映射技术。这些技术的应用使减弱设计的 MTTF 提高了 1.54 倍,因此比未减弱设计的 MTTF 提高了 22.8 倍。此外,还使用 BFAT 进行了辐射后故障分析,以发现剩余的设计漏洞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Fault Tolerance for FPGA SoCs Through Post Radiation Design Analysis
FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14 \(\times\) in Mean Time to Failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This paper applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping”. The application of these techniques improved MTTF of the mitigated design by a factor of 1.54 \(\times\) and thus provides a 22.8X \(\times\) MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信