{"title":"Fault tolerance of message delivery with cascading copies","authors":"H. Al-Jaber, S. Rotenstreich","doi":"10.1109/PARBSE.1990.77144","DOIUrl":null,"url":null,"abstract":"The authors present a fault-tolerance algorithm that guarantees the delivery of a message to its destination despite faults in one or more nodes in a system of loosely coupled processors. This algorithm is distinguished by not using the extra hardware or checkpoint facilities that are common to many algorithms of its type. Instead, it maintains an appropriate number of copies of the message in the nodes through which the message passes. In the case of a fault, the algorithm locates a copy of the message closest to the destination and resumes delivery of the message from this location. Failure detection and recovery are automatic and transparent to the users. The algorithm can be implemented on diskless systems, such as specialized real-time systems or parallel processing systems that use interconnection networks (e.g. a hypercube).<<ETX>>","PeriodicalId":389644,"journal":{"name":"Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PARBSE.1990.77144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The authors present a fault-tolerance algorithm that guarantees the delivery of a message to its destination despite faults in one or more nodes in a system of loosely coupled processors. This algorithm is distinguished by not using the extra hardware or checkpoint facilities that are common to many algorithms of its type. Instead, it maintains an appropriate number of copies of the message in the nodes through which the message passes. In the case of a fault, the algorithm locates a copy of the message closest to the destination and resumes delivery of the message from this location. Failure detection and recovery are automatic and transparent to the users. The algorithm can be implemented on diskless systems, such as specialized real-time systems or parallel processing systems that use interconnection networks (e.g. a hypercube).<>