{"title":"Reducing Tail Latencies while Improving Resiliency to Timing Errors for Stream Processing Workloads","authors":"Geoffrey Phi C. Tran, J. Walters, S. Crago","doi":"10.1109/UCC.2018.00028","DOIUrl":null,"url":null,"abstract":"Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in real-time data analytics services, such as processing Twitter tweets and Internet of Things (IoT) device feeds. Current stream processing frameworks boast high throughput and low average latency. However, users of these frameworks may desire lower tail latencies and better real-time performance for their applications. In practice, there are a number of errors that can affect the performance of stream processing applications, such as garbage collection and resource contention. For some applications, these errors may cause unacceptable violations of real-time constraints. In this paper we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Finally, we evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Our results show that redundant tuple processing can effectively reduce the tail latency, and that the number of missed deadlines can also be reduced by up to 94% in the best case. We also study the potential effects of duplication when applied at different stages in the topology. For the topologies in this paper, we further observe that duplication is most effective when computation is redundant at the first bolt. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results also show that computation overhead scales slower than communication, and that the real-time performance is improved in spite of the overheads. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.","PeriodicalId":288232,"journal":{"name":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in real-time data analytics services, such as processing Twitter tweets and Internet of Things (IoT) device feeds. Current stream processing frameworks boast high throughput and low average latency. However, users of these frameworks may desire lower tail latencies and better real-time performance for their applications. In practice, there are a number of errors that can affect the performance of stream processing applications, such as garbage collection and resource contention. For some applications, these errors may cause unacceptable violations of real-time constraints. In this paper we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Finally, we evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Our results show that redundant tuple processing can effectively reduce the tail latency, and that the number of missed deadlines can also be reduced by up to 94% in the best case. We also study the potential effects of duplication when applied at different stages in the topology. For the topologies in this paper, we further observe that duplication is most effective when computation is redundant at the first bolt. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results also show that computation overhead scales slower than communication, and that the real-time performance is improved in spite of the overheads. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.