Checkpoints-on-demand with active replication

Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281) Pub Date : 1998-10-20 DOI:10.1109/RELDIS.1998.740477

S. Rangarajan, S. Garg, Yennun Huang

引用次数: 5

Abstract

Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process.

查看原文本刊更多论文

按需检查点，活动复制

检查点和回滚恢复是一种众所周知的从软件过程故障中恢复的技术。利用各种检查点策略(如周期性检查点、随机检查点等)，已经开发了用于计算过程完成时间的分析模型。在本文中，我们展示了通过进程的主动复制，使用我们称为按需检查点的机制的策略将导致比使用定期检查点的传统方案所能实现的预期完成时间更短。使用按需检查点，当进程失败时，将从从该进程副本获取的诱导检查点恢复该进程。在组通信系统和Delta-4架构的进程克隆方法的上下文中，已经提出了通过从副本进行状态传输来恢复持久服务器进程的方法。但是，以前并没有提出并分析过它作为一种减少长时间运行过程的预期完成时间的机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)

自引率

0.00%

发文量