{"title":"Generic Checkpointing Support for Stream-based State-Machine Replication","authors":"Laura Lawniczak, Marco Ammon, T. Distler","doi":"10.1145/3578358.3591329","DOIUrl":null,"url":null,"abstract":"Stream-based replication facilitates the deployment and operation of state-machine replication protocols by running them as applications on top of data-stream processing frameworks. Taking advantage of platform-provided features, this approach makes it possible to significantly minimize implementation complexity at the protocol level. To further extend the associated benefits, in this paper we examine how the concept can be used to provide generic support for creating, storing, and applying checkpoints of replica states, both in the use case for catch up and garbage collection as well as to recover failed replicas. Specifically, we present three checkpointing-mechanism designs with different degrees of platform involvement and evaluate them in the context of Twitter's stream-processing engine Heron.","PeriodicalId":198398,"journal":{"name":"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578358.3591329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stream-based replication facilitates the deployment and operation of state-machine replication protocols by running them as applications on top of data-stream processing frameworks. Taking advantage of platform-provided features, this approach makes it possible to significantly minimize implementation complexity at the protocol level. To further extend the associated benefits, in this paper we examine how the concept can be used to provide generic support for creating, storing, and applying checkpoints of replica states, both in the use case for catch up and garbage collection as well as to recover failed replicas. Specifically, we present three checkpointing-mechanism designs with different degrees of platform involvement and evaluate them in the context of Twitter's stream-processing engine Heron.