Haonan Tian , Younis Ibrahim , Rui Chen , Yixiu Wang , Chen Jin , George Belev , Li Chen
{"title":"Comparative study: AutoDPR-SEM for enhancing CNN reliability in SRAM-based FPGAs through autonomous reconfiguration","authors":"Haonan Tian , Younis Ibrahim , Rui Chen , Yixiu Wang , Chen Jin , George Belev , Li Chen","doi":"10.1016/j.microrel.2024.115392","DOIUrl":null,"url":null,"abstract":"<div><p>Convolutional neural networks (CNNs) are widely adopted in safety-critical systems, including space applications and autonomous vehicles. Field-programmable gate arrays (FPGAs) based on SRAM are preferred for accelerating CNN computations due to their unique characteristics. However, the configuration memory of FPGAs is susceptible to single event effects (SEEs), which can corrupt computations and lead to misclassification of CNN outputs. In this study, we investigated the impact of SEEs on SRAM-based FPGAs with Two-Photon Absorption (TPA) laser fault injections through a comparative analysis of two popular CNN acceleration architectures: streaming architecture (SA) and single compute engine (SCE). Experimental results show that SA-based CNNs require more hardware resources but exhibit superior resilience against single event upsets (SEUs). Without any Radiation Hardened by Design (RHBD) protection, SCE has an error rate approximately twice as high as SA. To mitigate errors, the Xilinx IP core - Soft Error Mitigation (SEM) is used for error detection and correction, leading to error rate reductions of up to 50 % in both architectures. Importantly, we propose the AutoDPR-SEM (Autonomous Dynamic Partial Reconfiguration for Soft Error Mitigation) approach, which automatically reconfigures the SEM IP core when it remains idle due to uncorrectable errors. AutoDPR-SEM significantly improves CNN error rates, reducing errors by approximately 17.8 times in SCE and 14.8 times in SA. We also applied software level simulation to validate the TPA experiment, showing similar trends of the testing results across all models. In conclusion, the study confirms the feasibility of AutoDPR-SEM in both architectures, showcasing its potential to improve CNN error rates in safety-critical systems.</p></div>","PeriodicalId":51131,"journal":{"name":"Microelectronics Reliability","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Reliability","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0026271424000726","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (CNNs) are widely adopted in safety-critical systems, including space applications and autonomous vehicles. Field-programmable gate arrays (FPGAs) based on SRAM are preferred for accelerating CNN computations due to their unique characteristics. However, the configuration memory of FPGAs is susceptible to single event effects (SEEs), which can corrupt computations and lead to misclassification of CNN outputs. In this study, we investigated the impact of SEEs on SRAM-based FPGAs with Two-Photon Absorption (TPA) laser fault injections through a comparative analysis of two popular CNN acceleration architectures: streaming architecture (SA) and single compute engine (SCE). Experimental results show that SA-based CNNs require more hardware resources but exhibit superior resilience against single event upsets (SEUs). Without any Radiation Hardened by Design (RHBD) protection, SCE has an error rate approximately twice as high as SA. To mitigate errors, the Xilinx IP core - Soft Error Mitigation (SEM) is used for error detection and correction, leading to error rate reductions of up to 50 % in both architectures. Importantly, we propose the AutoDPR-SEM (Autonomous Dynamic Partial Reconfiguration for Soft Error Mitigation) approach, which automatically reconfigures the SEM IP core when it remains idle due to uncorrectable errors. AutoDPR-SEM significantly improves CNN error rates, reducing errors by approximately 17.8 times in SCE and 14.8 times in SA. We also applied software level simulation to validate the TPA experiment, showing similar trends of the testing results across all models. In conclusion, the study confirms the feasibility of AutoDPR-SEM in both architectures, showcasing its potential to improve CNN error rates in safety-critical systems.
期刊介绍:
Microelectronics Reliability, is dedicated to disseminating the latest research results and related information on the reliability of microelectronic devices, circuits and systems, from materials, process and manufacturing, to design, testing and operation. The coverage of the journal includes the following topics: measurement, understanding and analysis; evaluation and prediction; modelling and simulation; methodologies and mitigation. Papers which combine reliability with other important areas of microelectronics engineering, such as design, fabrication, integration, testing, and field operation will also be welcome, and practical papers reporting case studies in the field and specific application domains are particularly encouraged.
Most accepted papers will be published as Research Papers, describing significant advances and completed work. Papers reviewing important developing topics of general interest may be accepted for publication as Review Papers. Urgent communications of a more preliminary nature and short reports on completed practical work of current interest may be considered for publication as Research Notes. All contributions are subject to peer review by leading experts in the field.