{"title":"一种检测Xilinx SEM岩心故障的马尔可夫方法","authors":"T. Rajkumar, Johnny Öberg","doi":"10.1109/ICFPT56656.2022.9974240","DOIUrl":null,"url":null,"abstract":"The soft error mitigation (SEM) core is an internal scrubber used to detect and correct single event upsets in the configuration memory. Although the core can mitigate errors with a high accuracy, recent studies have found it to be vulnerable to radiation errors owing to its implementation in the FPGA fabric. As the reliability of the system depends on the correctness of the scrubber, undetected SEM failure is hazardous in critical applications. In this study, we investigate the effectiveness of Markov chains in detecting such failures. In order to minimise the effects of single event upsets, the detection scheme is implemented external to the FPGA and leverages log analysis to monitor the SEM health. We evaluated our approach on the Xilinx ZCU104 Ultrascale+ board using fault injection. The results show that the SEM failures caused by single and double bit errors could be detected with an $F_{1}$ score of 0.90 and 0.99 respectively. To the best of our knowledge, this is the first custom approach for failure detection in the SEM core.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Markovian Approach for Detecting Failures in the Xilinx SEM core\",\"authors\":\"T. Rajkumar, Johnny Öberg\",\"doi\":\"10.1109/ICFPT56656.2022.9974240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The soft error mitigation (SEM) core is an internal scrubber used to detect and correct single event upsets in the configuration memory. Although the core can mitigate errors with a high accuracy, recent studies have found it to be vulnerable to radiation errors owing to its implementation in the FPGA fabric. As the reliability of the system depends on the correctness of the scrubber, undetected SEM failure is hazardous in critical applications. In this study, we investigate the effectiveness of Markov chains in detecting such failures. In order to minimise the effects of single event upsets, the detection scheme is implemented external to the FPGA and leverages log analysis to monitor the SEM health. We evaluated our approach on the Xilinx ZCU104 Ultrascale+ board using fault injection. The results show that the SEM failures caused by single and double bit errors could be detected with an $F_{1}$ score of 0.90 and 0.99 respectively. To the best of our knowledge, this is the first custom approach for failure detection in the SEM core.\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Markovian Approach for Detecting Failures in the Xilinx SEM core
The soft error mitigation (SEM) core is an internal scrubber used to detect and correct single event upsets in the configuration memory. Although the core can mitigate errors with a high accuracy, recent studies have found it to be vulnerable to radiation errors owing to its implementation in the FPGA fabric. As the reliability of the system depends on the correctness of the scrubber, undetected SEM failure is hazardous in critical applications. In this study, we investigate the effectiveness of Markov chains in detecting such failures. In order to minimise the effects of single event upsets, the detection scheme is implemented external to the FPGA and leverages log analysis to monitor the SEM health. We evaluated our approach on the Xilinx ZCU104 Ultrascale+ board using fault injection. The results show that the SEM failures caused by single and double bit errors could be detected with an $F_{1}$ score of 0.90 and 0.99 respectively. To the best of our knowledge, this is the first custom approach for failure detection in the SEM core.