Yingying Zheng, Wei Wang, Lijie Xu, Zhen Tang, Zhongshan Ren, Jun Wei, Dan Ye
{"title":"Fast and Precise recovery in Stream processing based on Distributed Cache","authors":"Yingying Zheng, Wei Wang, Lijie Xu, Zhen Tang, Zhongshan Ren, Jun Wei, Dan Ye","doi":"10.1145/3131704.3131724","DOIUrl":null,"url":null,"abstract":"Stream processing system (SPS) faces the problem of node failure when running over a long period of time. In addition, \"exactly once\" precise semantic guarantee is more and more important for SPS in some scenarios. In general, the approaches to achieve precise semantic is by using global snapshot, which should store state and records to external reliable storage or rely on transactions. However, these approaches suffer from high recovery latency, because of large I/O disk overhead. In order to reduce excessive latency in failure recovery, we save the intermediate results which are produced during the stream processing, and propose an algorithm DCAS which asynchronously snapshots state to implements precise recovery. In addition, we use in-memory distributed cache to provide the storage of intermediate results and snapshots to reduce recovery latency. We evaluate our failure recovery approach in recovery latency and runtime overhead. The experimental results show that our approach is 2 to 6 times faster than other conventional failure recovery approaches, and induces a 6% runtime overhead.","PeriodicalId":349438,"journal":{"name":"Proceedings of the 9th Asia-Pacific Symposium on Internetware","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Asia-Pacific Symposium on Internetware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3131704.3131724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stream processing system (SPS) faces the problem of node failure when running over a long period of time. In addition, "exactly once" precise semantic guarantee is more and more important for SPS in some scenarios. In general, the approaches to achieve precise semantic is by using global snapshot, which should store state and records to external reliable storage or rely on transactions. However, these approaches suffer from high recovery latency, because of large I/O disk overhead. In order to reduce excessive latency in failure recovery, we save the intermediate results which are produced during the stream processing, and propose an algorithm DCAS which asynchronously snapshots state to implements precise recovery. In addition, we use in-memory distributed cache to provide the storage of intermediate results and snapshots to reduce recovery latency. We evaluate our failure recovery approach in recovery latency and runtime overhead. The experimental results show that our approach is 2 to 6 times faster than other conventional failure recovery approaches, and induces a 6% runtime overhead.