A new, efficient coordinated checkpointing protocol combined with selective sender-based message logging

2008 IEEE/ACS International Conference on Computer Systems and Applications Pub Date : 2008-03-31 DOI:10.1109/AICCSA.2008.4493571

C. Rao, M. M. Naidu

{"title":"A new, efficient coordinated checkpointing protocol combined with selective sender-based message logging","authors":"C. Rao, M. M. Naidu","doi":"10.1109/AICCSA.2008.4493571","DOIUrl":null,"url":null,"abstract":"Checkpointing and message logging are the popular and general-purpose tools for providing fault- tolerance in distributed systems. The most of the Coordinated checkpointing algorithms available in the literature have not addressed about treatment of the lost messages and these algorithms suffer from high output commit latency. To overcome the above limitations, we propose a new coordinated checkpointing protocol combined with selective sender-based message logging. The protocol is free from the problem of lost messages. The term 'selective' implies that messages are logged only within a specified interval known as active interval, thereby reducing message logging overhead. All processes take checkpoints at the end of their respective active intervals forming a consistent global state. Outside the active interval there is no checkpointing of process state. This protocol minimizes different overheads i.e. checkpointing overhead, message logging overhead, recovery overhead and blocking overhead. Unlike blocking coordinated checkpointing, the disk contentions are less in the proposed protocol.","PeriodicalId":234556,"journal":{"name":"2008 IEEE/ACS International Conference on Computer Systems and Applications","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE/ACS International Conference on Computer Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2008.4493571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Checkpointing and message logging are the popular and general-purpose tools for providing fault- tolerance in distributed systems. The most of the Coordinated checkpointing algorithms available in the literature have not addressed about treatment of the lost messages and these algorithms suffer from high output commit latency. To overcome the above limitations, we propose a new coordinated checkpointing protocol combined with selective sender-based message logging. The protocol is free from the problem of lost messages. The term 'selective' implies that messages are logged only within a specified interval known as active interval, thereby reducing message logging overhead. All processes take checkpoints at the end of their respective active intervals forming a consistent global state. Outside the active interval there is no checkpointing of process state. This protocol minimizes different overheads i.e. checkpointing overhead, message logging overhead, recovery overhead and blocking overhead. Unlike blocking coordinated checkpointing, the disk contentions are less in the proposed protocol.

查看原文本刊更多论文

一种新的、高效的协调检查点协议，结合了选择性的基于发送方的消息记录

检查点和消息日志是在分布式系统中提供容错的常用工具。文献中可用的大多数协调检查点算法都没有解决丢失消息的处理问题，这些算法的输出提交延迟很高。为了克服上述限制，我们提出了一种新的协调检查点协议，该协议结合了选择性的基于发送方的消息记录。该协议不存在丢失消息的问题。术语“选择性”意味着仅在指定的活动间隔内记录消息，从而减少了消息记录开销。所有进程在各自活动间隔结束时都有检查点，形成一致的全局状态。在活动间隔之外，没有进程状态的检查点。该协议最大限度地减少了不同的开销，如检查点开销、消息日志开销、恢复开销和阻塞开销。与阻塞协调检查点不同，在提议的协议中磁盘争用较少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE/ACS International Conference on Computer Systems and Applications

自引率

0.00%

发文量