BLOCKWATCH: Leveraging similarity in parallel programs for error detection

Jiesheng Wei, K. Pattabiraman
{"title":"BLOCKWATCH: Leveraging similarity in parallel programs for error detection","authors":"Jiesheng Wei, K. Pattabiraman","doi":"10.1109/DSN.2012.6263959","DOIUrl":null,"url":null,"abstract":"The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs' tasks have significant similarity in control data due to the use of high-level programming models. In this study, we propose BLOCKWATCH to leverage the similarity in parallel program's control data for detecting hardware errors. BLOCKWATCH statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BLOCKWATCH on seven SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BLOCKWATCH incurs an average overhead of 16% across all programs, and provides an average SDC coverage of 97% for faults in the control data.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2012.6263959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs' tasks have significant similarity in control data due to the use of high-level programming models. In this study, we propose BLOCKWATCH to leverage the similarity in parallel program's control data for detecting hardware errors. BLOCKWATCH statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BLOCKWATCH on seven SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BLOCKWATCH incurs an average overhead of 16% across all programs, and provides an average SDC coverage of 97% for faults in the control data.
BLOCKWATCH:利用并行程序中的相似性进行错误检测
硅器件的规模化加剧了现代计算机系统的不可靠性,而功率限制使得软件参与硬件错误检测成为必要。与此同时,多核革命推动了软件的并行化。因此,迫切需要保护并行程序不受硬件错误的影响。由于使用了高级编程模型,并行程序的任务在控制数据上具有显著的相似性。在本研究中,我们提出BLOCKWATCH来利用并行程序控制数据的相似性来检测硬件错误。BLOCKWATCH静态地提取并行程序中不同线程之间的相似度,并在运行时检查相似度。我们在七个SPLASH-2基准上评估BLOCKWATCH,以衡量其性能开销和错误检测覆盖率。我们发现,BLOCKWATCH在所有程序中产生的平均开销为16%,并为控制数据中的故障提供了97%的平均SDC覆盖率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信