Zhuang Yuan, Xiaohui Wei, Hongliang Li, Yongfang Wang, Xubin He
{"title":"An Optimal Checkpointing Model with Online OCI Adjustment for Stream Processing Applications","authors":"Zhuang Yuan, Xiaohui Wei, Hongliang Li, Yongfang Wang, Xubin He","doi":"10.1002/cpe.5347","DOIUrl":null,"url":null,"abstract":"Checkpoint-based fault tolerant method has been widely used to enhance the reliability of Distributed Stream Processing Engines (DSPEs), but a checkpointing process usually introduces considerable overhead. It is a critical issue to choose the Optimal Checkpoint Interval (OCI) that maximizes the processing efficiency. Traditional OCI models consider the recovery time only related to the execution time from the last checkpoint to the moment of the failure. They are not suitable for stream processing jobs because the recovery time is related to the reprocessing workload, which depends on the realtime input data before a failure. A new model is needed to choose the OCI for stream processing applications. Moreover, the input data rate of an stream processing job fluctuates over time. The OCI of an application should also be adjusted dynamically according to the input workload. To solve these problems, we present a novel DSPS Optimal Checkpoint Interval (DOCI) model in this paper. We prove that it maximizes the processing efficiency for a given time period. We propose an approach to dynamically adjust the OCI for an application to accommodate the realtime workload fluctuations. We conduct simulation experiments to verify the effectiveness of DOCI model and the efficiency of the online OCI adjustment algorithm. Experimental results with a real-world dataset show DOCI achieves an improvement on system efficiency by up to 40%, comparing with existing fault-tolerant approaches.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/cpe.5347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Checkpoint-based fault tolerant method has been widely used to enhance the reliability of Distributed Stream Processing Engines (DSPEs), but a checkpointing process usually introduces considerable overhead. It is a critical issue to choose the Optimal Checkpoint Interval (OCI) that maximizes the processing efficiency. Traditional OCI models consider the recovery time only related to the execution time from the last checkpoint to the moment of the failure. They are not suitable for stream processing jobs because the recovery time is related to the reprocessing workload, which depends on the realtime input data before a failure. A new model is needed to choose the OCI for stream processing applications. Moreover, the input data rate of an stream processing job fluctuates over time. The OCI of an application should also be adjusted dynamically according to the input workload. To solve these problems, we present a novel DSPS Optimal Checkpoint Interval (DOCI) model in this paper. We prove that it maximizes the processing efficiency for a given time period. We propose an approach to dynamically adjust the OCI for an application to accommodate the realtime workload fluctuations. We conduct simulation experiments to verify the effectiveness of DOCI model and the efficiency of the online OCI adjustment algorithm. Experimental results with a real-world dataset show DOCI achieves an improvement on system efficiency by up to 40%, comparing with existing fault-tolerant approaches.