Design and analysis of a hardware-assisted checkpointing and recovery scheme for distributed applications

Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281) Pub Date : 1998-10-20 DOI:10.1109/RELDIS.1998.740478

B. Ramamurthy, S. Upadhyaya, B. Bhargava

引用次数: 4

Abstract

A checkpointing and recovery scheme which exploits the low latency and high coverage characteristics of a hardware error detection scheme is presented. Message dependency which is the main source of multi-step rollback in distributed systems is minimized by using a new message validation technique derived from hardware-assisted error detection. The main contribution of this paper is the development of an analytical model to establish the completeness and correctness of the new scheme. A novel concept of global state matrix is defined to keep track of the global state in a distributed system and assist in recovery. An illustration is given to show the distinction between conventional and the new recovery schemes.

查看原文本刊更多论文

分布式应用的硬件辅助检查点和恢复方案的设计与分析

提出了一种利用硬件错误检测方案的低延迟和高覆盖特性的检查点和恢复方案。消息依赖性是分布式系统中多步骤回滚的主要来源，通过使用从硬件辅助错误检测派生的新的消息验证技术，最小化了消息依赖性。本文的主要贡献是建立了一个分析模型来证明新方案的完备性和正确性。为了在分布式系统中跟踪全局状态并协助恢复，提出了全局状态矩阵的概念。文中还举例说明了传统采油方案与新型采油方案的区别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)

自引率

0.00%

发文量