Checkpoints-on-demand with active replication

S. Rangarajan, S. Garg, Yennun Huang
{"title":"Checkpoints-on-demand with active replication","authors":"S. Rangarajan, S. Garg, Yennun Huang","doi":"10.1109/RELDIS.1998.740477","DOIUrl":null,"url":null,"abstract":"Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process.","PeriodicalId":376253,"journal":{"name":"Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.1998.740477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process.
按需检查点,活动复制
检查点和回滚恢复是一种众所周知的从软件过程故障中恢复的技术。利用各种检查点策略(如周期性检查点、随机检查点等),已经开发了用于计算过程完成时间的分析模型。在本文中,我们展示了通过进程的主动复制,使用我们称为按需检查点的机制的策略将导致比使用定期检查点的传统方案所能实现的预期完成时间更短。使用按需检查点,当进程失败时,将从从该进程副本获取的诱导检查点恢复该进程。在组通信系统和Delta-4架构的进程克隆方法的上下文中,已经提出了通过从副本进行状态传输来恢复持久服务器进程的方法。但是,以前并没有提出并分析过它作为一种减少长时间运行过程的预期完成时间的机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信