{"title":"使用同步时钟的分布式系统软件中的检查点处理","authors":"S. Neogy, A. Sinha, P. K. Das","doi":"10.1109/ITCC.2001.918855","DOIUrl":null,"url":null,"abstract":"The method of taking checkpoints in a truly distributed manner, that is in the absence of a global checkpoint coordinator has been very tricky. This has been dealt with in a system that uses a loosely synchronized clock. The constituent processes take their checkpoints according to their own clocks at predetermined checkpoint instants. Since these checkpoints are asynchronous, in order to determine a global consistent set of such checkpoints there must be some sort of synchronization among them. Synchronization information is appended to clock synchronization messages that are used by the constituent processes for checkpoint-synchronization. Communication in this system is synchronous, so processes may be blocked for communication at the checkpointing instants. The blocked processes take their checkpoints after they unblock. It is shown that the set of such i-th checkpoints is consistent and hence the rollback required by the system in case failure occurs is only up to the last saved state.","PeriodicalId":318295,"journal":{"name":"Proceedings International Conference on Information Technology: Coding and Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Checkpoint processing in distributed systems software using synchronized clocks\",\"authors\":\"S. Neogy, A. Sinha, P. K. Das\",\"doi\":\"10.1109/ITCC.2001.918855\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The method of taking checkpoints in a truly distributed manner, that is in the absence of a global checkpoint coordinator has been very tricky. This has been dealt with in a system that uses a loosely synchronized clock. The constituent processes take their checkpoints according to their own clocks at predetermined checkpoint instants. Since these checkpoints are asynchronous, in order to determine a global consistent set of such checkpoints there must be some sort of synchronization among them. Synchronization information is appended to clock synchronization messages that are used by the constituent processes for checkpoint-synchronization. Communication in this system is synchronous, so processes may be blocked for communication at the checkpointing instants. The blocked processes take their checkpoints after they unblock. It is shown that the set of such i-th checkpoints is consistent and hence the rollback required by the system in case failure occurs is only up to the last saved state.\",\"PeriodicalId\":318295,\"journal\":{\"name\":\"Proceedings International Conference on Information Technology: Coding and Computing\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Information Technology: Coding and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCC.2001.918855\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Information Technology: Coding and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2001.918855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Checkpoint processing in distributed systems software using synchronized clocks
The method of taking checkpoints in a truly distributed manner, that is in the absence of a global checkpoint coordinator has been very tricky. This has been dealt with in a system that uses a loosely synchronized clock. The constituent processes take their checkpoints according to their own clocks at predetermined checkpoint instants. Since these checkpoints are asynchronous, in order to determine a global consistent set of such checkpoints there must be some sort of synchronization among them. Synchronization information is appended to clock synchronization messages that are used by the constituent processes for checkpoint-synchronization. Communication in this system is synchronous, so processes may be blocked for communication at the checkpointing instants. The blocked processes take their checkpoints after they unblock. It is shown that the set of such i-th checkpoints is consistent and hence the rollback required by the system in case failure occurs is only up to the last saved state.