使用增量快照的协调检查点技术的建议和评估

Mamoru Ohara, Masayuki Arai, Satoshi Fukumoto, Kazuhiko Iwasaki
{"title":"使用增量快照的协调检查点技术的建议和评估","authors":"Mamoru Ohara,&nbsp;Masayuki Arai,&nbsp;Satoshi Fukumoto,&nbsp;Kazuhiko Iwasaki","doi":"10.1002/ecjc.20296","DOIUrl":null,"url":null,"abstract":"<p>Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20296</p>","PeriodicalId":100407,"journal":{"name":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","volume":"90 8","pages":"39-53"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ecjc.20296","citationCount":"1","resultStr":"{\"title\":\"A proposal and evaluation of a coordinated checkpointing technique using incremental snapshots\",\"authors\":\"Mamoru Ohara,&nbsp;Masayuki Arai,&nbsp;Satoshi Fukumoto,&nbsp;Kazuhiko Iwasaki\",\"doi\":\"10.1002/ecjc.20296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20296</p>\",\"PeriodicalId\":100407,\"journal\":{\"name\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"volume\":\"90 8\",\"pages\":\"39-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1002/ecjc.20296\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ecjc.20296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ecjc.20296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

协调检查点技术确保通过进程之间的协调来维护一致的全局状态。该方法要求暂时停止交换应用程序消息,但因此简化了从故障中恢复时的回滚过程,并且恢复成本很小。随着目前通信成本的降低,协调技术的重要性可能会越来越大。然而,在大型系统中,由于频繁停止消息交换,性能可能会严重受损。在本文中,我们提出了一种方法,即仅在周期性访问的检查点生成点的子集上执行协调,而在其余点上,每个进程独立地生成增量快照。该方法旨在缓解因协调而导致的性能下降,并实现相对高速的恢复。在评估该方法的有效性时,我们使用概率模型和模拟来估计检查点开销和恢复成本,并将其与现有的协调方法进行比较。结果表明,在消息频率相对较低的环境中,从性能和可靠性的角度来看,所提出的方法比现有的协调方法更有效。此外,我们对两种不同的delta方案进行了比较,以表示增量快照,并讨论了它们各自适合的环境。©2007 Wiley Periodicals,股份有限公司Electron Comm Jpn Pt 3,90(8):39-532007;在线发表于Wiley InterScience(www.InterScience.Wiley.com)。DOI 10.1002/ecjc.20296
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A proposal and evaluation of a coordinated checkpointing technique using incremental snapshots

Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20296

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信