数据流错误恢复与检查点和指令级容错

Lei Xiong, QingPing Tan
{"title":"数据流错误恢复与检查点和指令级容错","authors":"Lei Xiong, QingPing Tan","doi":"10.1109/PDCAT.2011.33","DOIUrl":null,"url":null,"abstract":"Soft error detection and recovery are important to the system reliability, especially for the improvement of fabrication technology. Instruction-level soft error tolerance method which needs not additional hardware is broadly discussed. This paper proposes an application level data flow error recovery approach which combines the technique check pointing with instruction level fault tolerance method. On the instruction level, those codes are divided into protected codes and unprotected codes based on their sensibility to soft errors on hardware. For those protected codes, every data is copied with two versions. At some program points such as store instruction and branch instruction in the program, these related data are checked. If the two version data are not identical, we consider that there is a happened soft error. Then the program state is restored from a prior check point which is related to the error data. For a checked data, the check point which is related to the data is saved based on the program slice whose original program is from the beginning of the program to the checked data. Finally, the approach is implemented in our experiments, and experimental results demonstrate our approach.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Data Flow Error Recovery with Checkpointing and Instruction-Level Fault Tolerance\",\"authors\":\"Lei Xiong, QingPing Tan\",\"doi\":\"10.1109/PDCAT.2011.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Soft error detection and recovery are important to the system reliability, especially for the improvement of fabrication technology. Instruction-level soft error tolerance method which needs not additional hardware is broadly discussed. This paper proposes an application level data flow error recovery approach which combines the technique check pointing with instruction level fault tolerance method. On the instruction level, those codes are divided into protected codes and unprotected codes based on their sensibility to soft errors on hardware. For those protected codes, every data is copied with two versions. At some program points such as store instruction and branch instruction in the program, these related data are checked. If the two version data are not identical, we consider that there is a happened soft error. Then the program state is restored from a prior check point which is related to the error data. For a checked data, the check point which is related to the data is saved based on the program slice whose original program is from the beginning of the program to the checked data. Finally, the approach is implemented in our experiments, and experimental results demonstrate our approach.\",\"PeriodicalId\":137617,\"journal\":{\"name\":\"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2011.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2011.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

软错误检测和恢复对提高系统的可靠性,特别是对改进制造工艺具有重要意义。广泛讨论了不需要额外硬件的指令级软容错方法。本文提出了一种应用级数据流错误恢复方法,该方法将检测指向技术与指令级容错方法相结合。在指令层上,根据对硬件软错误的敏感性,将这些代码分为受保护代码和不受保护代码。对于那些受保护的代码,每个数据都以两个版本复制。在某些程序点,如程序中的存储指令和分支指令,检查这些相关数据。如果两个版本的数据不相同,我们认为发生了软错误。然后从与错误数据相关的先前检查点恢复程序状态。对于被检查的数据,根据原始程序为从程序开始到被检查数据的程序片保存与该数据相关的检查点。最后,将该方法应用于实验,实验结果验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data Flow Error Recovery with Checkpointing and Instruction-Level Fault Tolerance
Soft error detection and recovery are important to the system reliability, especially for the improvement of fabrication technology. Instruction-level soft error tolerance method which needs not additional hardware is broadly discussed. This paper proposes an application level data flow error recovery approach which combines the technique check pointing with instruction level fault tolerance method. On the instruction level, those codes are divided into protected codes and unprotected codes based on their sensibility to soft errors on hardware. For those protected codes, every data is copied with two versions. At some program points such as store instruction and branch instruction in the program, these related data are checked. If the two version data are not identical, we consider that there is a happened soft error. Then the program state is restored from a prior check point which is related to the error data. For a checked data, the check point which is related to the data is saved based on the program slice whose original program is from the beginning of the program to the checked data. Finally, the approach is implemented in our experiments, and experimental results demonstrate our approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信