Architectural vulnerability aware checkpoint placement in a multicore processor

A. Lotfi, Arash Bayat, S. Safari
{"title":"Architectural vulnerability aware checkpoint placement in a multicore processor","authors":"A. Lotfi, Arash Bayat, S. Safari","doi":"10.1109/IOLTS.2012.6313852","DOIUrl":null,"url":null,"abstract":"As the system complexity increases, the failure probability increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing technique is widely used to reduce the execution time of long-running programs in presence of failures and enhancing the reliability of such systems. Several methods were studied thus far in order to determine the checkpointing interval which optimizes system performance. The crucial parameter in all of these solutions is system failure model which is primarily assumed as exponential or Weibull distributions. But, these models are not perfectly accurate since they fail to model the effect of soft errors. In this paper, we introduce a more realistic failure model based on the processors AVF. In addition, we propose three checkpoint placement methods with constant and variable intervals that determine suitable checkpoint places for the proposed failure model. Our experimental results show that our method, which is implementable on any multicore system, can find the suitable points in which checkpoints should be taken.","PeriodicalId":246222,"journal":{"name":"2012 IEEE 18th International On-Line Testing Symposium (IOLTS)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 18th International On-Line Testing Symposium (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2012.6313852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As the system complexity increases, the failure probability increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing technique is widely used to reduce the execution time of long-running programs in presence of failures and enhancing the reliability of such systems. Several methods were studied thus far in order to determine the checkpointing interval which optimizes system performance. The crucial parameter in all of these solutions is system failure model which is primarily assumed as exponential or Weibull distributions. But, these models are not perfectly accurate since they fail to model the effect of soft errors. In this paper, we introduce a more realistic failure model based on the processors AVF. In addition, we propose three checkpoint placement methods with constant and variable intervals that determine suitable checkpoint places for the proposed failure model. Our experimental results show that our method, which is implementable on any multicore system, can find the suitable points in which checkpoints should be taken.
多核处理器中的体系结构漏洞感知检查点放置
随着系统复杂性的增加,故障概率也随之大幅增加。因此,系统需要支持容错的技术。检查点技术被广泛用于减少存在故障的长时间运行程序的执行时间,并提高这类系统的可靠性。为了确定使系统性能最优的检查点间隔,目前研究了几种方法。所有这些解决方案的关键参数是系统失效模型,主要假设为指数分布或威布尔分布。但是,这些模型并不完全准确,因为它们不能模拟软误差的影响。本文介绍了一种基于处理器AVF的更为现实的故障模型。此外,我们提出了三种具有恒定和可变间隔的检查点放置方法,以确定所提出的故障模型的合适检查点位置。实验结果表明,我们的方法可以在任何多核系统上实现,可以找到应该采取检查点的合适点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信