{"title":"An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability","authors":"Islene C. Garcia, L. E. Buzato","doi":"10.1109/RELDIS.2004.1353013","DOIUrl":null,"url":null,"abstract":"A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RELDIS.2004.1353013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.