{"title":"Checkpoints-on-demand with active replication","authors":"S. Rangarajan, S. Garg, Yennun Huang","doi":"10.1109/RELDIS.1998.740477","DOIUrl":null,"url":null,"abstract":"Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process.","PeriodicalId":376253,"journal":{"name":"Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.1998.740477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Checkpointing and roll-back recovery is a well known technique for recovering from software process failures. Analytical models have been developed for computing the completion time of processes that use various checkpointing strategies such as periodic checkpointing, random checkpointing etc. In this paper, we show that with active replication of processes, a strategy that uses a mechanism we call checkpoints-on-demand will result in an expected completion time smaller than that can be achieved with traditional schemes that use periodic checkpoints. With checkpoints-on-demand, when a process fails, it is recovered from an induced checkpoint taken of a replica of the process. Recovery of persistent server processes through state-transfer from a replica has been proposed in the context of group communication systems and in the process cloning approach of the Delta-4 architecture. But it has not been previously proposed and analyzed as a mechanism for reducing the expected completion time of a long running process.