{"title":"A new, efficient coordinated checkpointing protocol combined with selective sender-based message logging","authors":"C. Rao, M. M. Naidu","doi":"10.1109/AICCSA.2008.4493571","DOIUrl":null,"url":null,"abstract":"Checkpointing and message logging are the popular and general-purpose tools for providing fault- tolerance in distributed systems. The most of the Coordinated checkpointing algorithms available in the literature have not addressed about treatment of the lost messages and these algorithms suffer from high output commit latency. To overcome the above limitations, we propose a new coordinated checkpointing protocol combined with selective sender-based message logging. The protocol is free from the problem of lost messages. The term 'selective' implies that messages are logged only within a specified interval known as active interval, thereby reducing message logging overhead. All processes take checkpoints at the end of their respective active intervals forming a consistent global state. Outside the active interval there is no checkpointing of process state. This protocol minimizes different overheads i.e. checkpointing overhead, message logging overhead, recovery overhead and blocking overhead. Unlike blocking coordinated checkpointing, the disk contentions are less in the proposed protocol.","PeriodicalId":234556,"journal":{"name":"2008 IEEE/ACS International Conference on Computer Systems and Applications","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE/ACS International Conference on Computer Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2008.4493571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Checkpointing and message logging are the popular and general-purpose tools for providing fault- tolerance in distributed systems. The most of the Coordinated checkpointing algorithms available in the literature have not addressed about treatment of the lost messages and these algorithms suffer from high output commit latency. To overcome the above limitations, we propose a new coordinated checkpointing protocol combined with selective sender-based message logging. The protocol is free from the problem of lost messages. The term 'selective' implies that messages are logged only within a specified interval known as active interval, thereby reducing message logging overhead. All processes take checkpoints at the end of their respective active intervals forming a consistent global state. Outside the active interval there is no checkpointing of process state. This protocol minimizes different overheads i.e. checkpointing overhead, message logging overhead, recovery overhead and blocking overhead. Unlike blocking coordinated checkpointing, the disk contentions are less in the proposed protocol.