分层硬件检查点加速基于sram的fpga的故障恢复

En-Chung Yang, Keheng Huang, Yu Hu, Xiaowei Li, Jian Gong, Hongjin Liu, Bo Liu
{"title":"分层硬件检查点加速基于sram的fpga的故障恢复","authors":"En-Chung Yang, Keheng Huang, Yu Hu, Xiaowei Li, Jian Gong, Hongjin Liu, Bo Liu","doi":"10.1109/IOLTS.2013.6604078","DOIUrl":null,"url":null,"abstract":"As the feature size shrinks to the nanometer scale, SRAM-based FPGAs are increasingly vulnerable to soft errors. Checkpointing is an effective fault recovery technique that can restore the faulty system to its previous fault free state. Since the function of the system needs to be suspended during checkpoint saving and checkpoint restoring, so the Mean Time to Repair (MTTR) of the system is critical to the system performance. In this work, we propose a hierarchical hardware checkpointing (HHC) technique that contains a high-speed on-chip checkpoint and a low-speed off-chip checkpoint to accelerate fault recovery for SRAM-based FPGAs. Most of single event effect (SEE) faults can be recovered by the high-speed on-chip checkpoint, which significantly reduces the MTTR of the system. The memory resource occupation of the on-chip checkpoint is low because HHC only stores the logic states of user bits and check information for configuration bits. Experimental results show that, compared with traditional off-chip checkpoint strategies, the proposed technique can reduce the MTTR of the system by 94.30%. In addition, the memory resource occupation is 11.11% of FPGAs, a little high but can be further optimized.","PeriodicalId":423175,"journal":{"name":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"HHC: Hierarchical hardware checkpointing to accelerate fault recovery for SRAM-based FPGAs\",\"authors\":\"En-Chung Yang, Keheng Huang, Yu Hu, Xiaowei Li, Jian Gong, Hongjin Liu, Bo Liu\",\"doi\":\"10.1109/IOLTS.2013.6604078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the feature size shrinks to the nanometer scale, SRAM-based FPGAs are increasingly vulnerable to soft errors. Checkpointing is an effective fault recovery technique that can restore the faulty system to its previous fault free state. Since the function of the system needs to be suspended during checkpoint saving and checkpoint restoring, so the Mean Time to Repair (MTTR) of the system is critical to the system performance. In this work, we propose a hierarchical hardware checkpointing (HHC) technique that contains a high-speed on-chip checkpoint and a low-speed off-chip checkpoint to accelerate fault recovery for SRAM-based FPGAs. Most of single event effect (SEE) faults can be recovered by the high-speed on-chip checkpoint, which significantly reduces the MTTR of the system. The memory resource occupation of the on-chip checkpoint is low because HHC only stores the logic states of user bits and check information for configuration bits. Experimental results show that, compared with traditional off-chip checkpoint strategies, the proposed technique can reduce the MTTR of the system by 94.30%. In addition, the memory resource occupation is 11.11% of FPGAs, a little high but can be further optimized.\",\"PeriodicalId\":423175,\"journal\":{\"name\":\"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IOLTS.2013.6604078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 19th International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2013.6604078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

随着特征尺寸缩小到纳米尺度,基于sram的fpga越来越容易受到软误差的影响。检查点是一种有效的故障恢复技术,可以将故障系统恢复到以前的无故障状态。由于在检查点保存和检查点恢复过程中需要暂停系统的功能,因此系统的平均修复时间(Mean Time to Repair, MTTR)对系统的性能至关重要。在这项工作中,我们提出了一种分层硬件检查点(HHC)技术,该技术包含一个高速片内检查点和一个低速片外检查点,以加速基于sram的fpga的故障恢复。大部分的单事件效应(SEE)故障都可以通过高速片上检查点恢复,大大降低了系统的MTTR。片上检查点的内存资源占用很低,因为HHC只存储用户位的逻辑状态和配置位的检查信息。实验结果表明,与传统的片外检查点策略相比,该技术可将系统的MTTR降低94.30%。另外,fpga的内存资源占用为11.11%,稍高,但可以进一步优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HHC: Hierarchical hardware checkpointing to accelerate fault recovery for SRAM-based FPGAs
As the feature size shrinks to the nanometer scale, SRAM-based FPGAs are increasingly vulnerable to soft errors. Checkpointing is an effective fault recovery technique that can restore the faulty system to its previous fault free state. Since the function of the system needs to be suspended during checkpoint saving and checkpoint restoring, so the Mean Time to Repair (MTTR) of the system is critical to the system performance. In this work, we propose a hierarchical hardware checkpointing (HHC) technique that contains a high-speed on-chip checkpoint and a low-speed off-chip checkpoint to accelerate fault recovery for SRAM-based FPGAs. Most of single event effect (SEE) faults can be recovered by the high-speed on-chip checkpoint, which significantly reduces the MTTR of the system. The memory resource occupation of the on-chip checkpoint is low because HHC only stores the logic states of user bits and check information for configuration bits. Experimental results show that, compared with traditional off-chip checkpoint strategies, the proposed technique can reduce the MTTR of the system by 94.30%. In addition, the memory resource occupation is 11.11% of FPGAs, a little high but can be further optimized.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信