{"title":"Reliable probabilistic checkpointing","authors":"Hyo-Chang Nam, Jong Kim, S. Hong, Sunggu Lee","doi":"10.1109/PRDC.1999.816224","DOIUrl":"https://doi.org/10.1109/PRDC.1999.816224","url":null,"abstract":"Recently proposed probabilistic checkpointing has one drawback, naming aliasing. When analyzed, 64-bit signatures show negligible possibility of aliasing. But in practice, the shift-XOR signature generation function used with probabilistic checkpointing shows a high aliasing rate, which limits the practicality of probabilistic checkpointing. In this paper, two enhancements are considered to make probabilistic checkpointing more reliable. One is the signature generation function and the other is the recovery scheme. In the signature generation function part, we propose two signature generation functions: HALF for small block sizes (less than or equal to 256 bytes) and C-HALF(CRC combined HALF) for large block sizes (larger than 256 bytes), which have an aliasing probability similar to analytic results and small overhead. In the recovery scheme part, we propose a recovery scheme which ensures the safety of probabilistic checkpointing. To examine the correctness of previous checkpoints at recovery time, the proposed recovery scheme uses a spare node. We analyze the recovery scheme using a mathematical model. Also an optimal checkpoint interval is derived using the model.","PeriodicalId":389294,"journal":{"name":"Proceedings 1999 Pacific Rim International Symposium on Dependable Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121303541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interconnecting lock-step synchronous fault-tolerant systems based on voting and error-correcting codes","authors":"T. Krol","doi":"10.1109/PRDC.1999.816213","DOIUrl":"https://doi.org/10.1109/PRDC.1999.816213","url":null,"abstract":"The correctness of the behavior of a fault-tolerant system depends among other things on the correct distribution of the data descending from unreliable I/O devices over the modules of the fault-tolerant system, the so-called input-problem. More generally, a maliciously behaving system, whether it is fault-tolerant or not, should never defeat a correctly functioning fault-tolerant system, i.e. a system which does not contain more faulty modules than it is designed to tolerate. This paper presents a new class of synchronous deterministic non-authenticated algorithms for reaching Byzantine agreement on data descending from other (fault-tolerant) devices. The algorithms are based on voting and error-correcting codes and require considerably less data communication than the existing algorithms, whereas the number of rounds and the number of modules meet the minimum bounds.","PeriodicalId":389294,"journal":{"name":"Proceedings 1999 Pacific Rim International Symposium on Dependable Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128458425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}