Comparative study: AutoDPR-SEM for enhancing CNN reliability in SRAM-based FPGAs through autonomous reconfiguration

IF 1.6 4区 工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Haonan Tian , Younis Ibrahim , Rui Chen , Yixiu Wang , Chen Jin , George Belev , Li Chen
{"title":"Comparative study: AutoDPR-SEM for enhancing CNN reliability in SRAM-based FPGAs through autonomous reconfiguration","authors":"Haonan Tian ,&nbsp;Younis Ibrahim ,&nbsp;Rui Chen ,&nbsp;Yixiu Wang ,&nbsp;Chen Jin ,&nbsp;George Belev ,&nbsp;Li Chen","doi":"10.1016/j.microrel.2024.115392","DOIUrl":null,"url":null,"abstract":"<div><p>Convolutional neural networks (CNNs) are widely adopted in safety-critical systems, including space applications and autonomous vehicles. Field-programmable gate arrays (FPGAs) based on SRAM are preferred for accelerating CNN computations due to their unique characteristics. However, the configuration memory of FPGAs is susceptible to single event effects (SEEs), which can corrupt computations and lead to misclassification of CNN outputs. In this study, we investigated the impact of SEEs on SRAM-based FPGAs with Two-Photon Absorption (TPA) laser fault injections through a comparative analysis of two popular CNN acceleration architectures: streaming architecture (SA) and single compute engine (SCE). Experimental results show that SA-based CNNs require more hardware resources but exhibit superior resilience against single event upsets (SEUs). Without any Radiation Hardened by Design (RHBD) protection, SCE has an error rate approximately twice as high as SA. To mitigate errors, the Xilinx IP core - Soft Error Mitigation (SEM) is used for error detection and correction, leading to error rate reductions of up to 50 % in both architectures. Importantly, we propose the AutoDPR-SEM (Autonomous Dynamic Partial Reconfiguration for Soft Error Mitigation) approach, which automatically reconfigures the SEM IP core when it remains idle due to uncorrectable errors. AutoDPR-SEM significantly improves CNN error rates, reducing errors by approximately 17.8 times in SCE and 14.8 times in SA. We also applied software level simulation to validate the TPA experiment, showing similar trends of the testing results across all models. In conclusion, the study confirms the feasibility of AutoDPR-SEM in both architectures, showcasing its potential to improve CNN error rates in safety-critical systems.</p></div>","PeriodicalId":51131,"journal":{"name":"Microelectronics Reliability","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Reliability","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0026271424000726","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional neural networks (CNNs) are widely adopted in safety-critical systems, including space applications and autonomous vehicles. Field-programmable gate arrays (FPGAs) based on SRAM are preferred for accelerating CNN computations due to their unique characteristics. However, the configuration memory of FPGAs is susceptible to single event effects (SEEs), which can corrupt computations and lead to misclassification of CNN outputs. In this study, we investigated the impact of SEEs on SRAM-based FPGAs with Two-Photon Absorption (TPA) laser fault injections through a comparative analysis of two popular CNN acceleration architectures: streaming architecture (SA) and single compute engine (SCE). Experimental results show that SA-based CNNs require more hardware resources but exhibit superior resilience against single event upsets (SEUs). Without any Radiation Hardened by Design (RHBD) protection, SCE has an error rate approximately twice as high as SA. To mitigate errors, the Xilinx IP core - Soft Error Mitigation (SEM) is used for error detection and correction, leading to error rate reductions of up to 50 % in both architectures. Importantly, we propose the AutoDPR-SEM (Autonomous Dynamic Partial Reconfiguration for Soft Error Mitigation) approach, which automatically reconfigures the SEM IP core when it remains idle due to uncorrectable errors. AutoDPR-SEM significantly improves CNN error rates, reducing errors by approximately 17.8 times in SCE and 14.8 times in SA. We also applied software level simulation to validate the TPA experiment, showing similar trends of the testing results across all models. In conclusion, the study confirms the feasibility of AutoDPR-SEM in both architectures, showcasing its potential to improve CNN error rates in safety-critical systems.

比较研究:通过自主重新配置提高基于 SRAM FPGA 的 CNN 可靠性的 AutoDPR-SEM
卷积神经网络(CNN)被广泛应用于包括空间应用和自动驾驶汽车在内的安全关键系统中。基于 SRAM 的现场可编程门阵列 (FPGA) 因其独一无二的特性而成为加速 CNN 计算的首选。然而,FPGA 的配置存储器容易受到单事件效应 (SEE) 的影响,从而破坏计算并导致 CNN 输出分类错误。在本研究中,我们通过对流式架构(SA)和单计算引擎(SCE)这两种流行的 CNN 加速架构进行比较分析,研究了 SEE 对基于 SRAM 的 FPGA 的双光子吸收(TPA)激光故障注入的影响。实验结果表明,基于流架构的 CNN 需要更多的硬件资源,但对单次事件中断(SEUs)的恢复能力更强。在没有任何辐射加固设计(RHBD)保护的情况下,SCE 的错误率约为 SA 的两倍。为了减少错误,赛灵思 IP 核--软错误缓解(SEM)被用于错误检测和纠正,从而使两种架构的错误率都降低了 50%。重要的是,我们提出了 AutoDPR-SEM(用于软错误缓解的自主动态部分重新配置)方法,当 SEM IP 核因无法纠正错误而处于空闲状态时,它会自动重新配置。AutoDPR-SEM 显著提高了 CNN 错误率,在 SCE 中将错误减少了约 17.8 倍,在 SA 中将错误减少了约 14.8 倍。我们还应用软件级仿真验证了 TPA 实验,结果显示所有模型的测试结果趋势相似。总之,这项研究证实了 AutoDPR-SEM 在这两种架构中的可行性,展示了其改善安全关键型系统中 CNN 错误率的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microelectronics Reliability
Microelectronics Reliability 工程技术-工程:电子与电气
CiteScore
3.30
自引率
12.50%
发文量
342
审稿时长
68 days
期刊介绍: Microelectronics Reliability, is dedicated to disseminating the latest research results and related information on the reliability of microelectronic devices, circuits and systems, from materials, process and manufacturing, to design, testing and operation. The coverage of the journal includes the following topics: measurement, understanding and analysis; evaluation and prediction; modelling and simulation; methodologies and mitigation. Papers which combine reliability with other important areas of microelectronics engineering, such as design, fabrication, integration, testing, and field operation will also be welcome, and practical papers reporting case studies in the field and specific application domains are particularly encouraged. Most accepted papers will be published as Research Papers, describing significant advances and completed work. Papers reviewing important developing topics of general interest may be accepted for publication as Review Papers. Urgent communications of a more preliminary nature and short reports on completed practical work of current interest may be considered for publication as Research Notes. All contributions are subject to peer review by leading experts in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信