Fault tolerance of message delivery with cascading copies

Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications Pub Date : 1990-03-07 DOI:10.1109/PARBSE.1990.77144

H. Al-Jaber, S. Rotenstreich

引用次数: 0

Abstract

The authors present a fault-tolerance algorithm that guarantees the delivery of a message to its destination despite faults in one or more nodes in a system of loosely coupled processors. This algorithm is distinguished by not using the extra hardware or checkpoint facilities that are common to many algorithms of its type. Instead, it maintains an appropriate number of copies of the message in the nodes through which the message passes. In the case of a fault, the algorithm locates a copy of the message closest to the destination and resumes delivery of the message from this location. Failure detection and recovery are automatic and transparent to the users. The algorithm can be implemented on diskless systems, such as specialized real-time systems or parallel processing systems that use interconnection networks (e.g. a hypercube).<>

查看原文本刊更多论文

具有级联副本的消息传递的容错性

作者提出了一种容错算法，该算法保证在松散耦合处理器系统中的一个或多个节点出现故障时仍能将消息传递到目的地。该算法的特点是不使用许多同类型算法常见的额外硬件或检查点设施。相反，它在消息经过的节点中维护适当数量的消息副本。在发生故障的情况下，该算法定位最靠近目的地的消息副本，并从该位置恢复消息的传递。故障检测和恢复是自动的，对用户是透明的。该算法可以在无磁盘系统上实现，例如专用实时系统或使用互连网络(例如超立方体)的并行处理系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications

自引率

0.00%

发文量