Reducing Tail Latencies while Improving Resiliency to Timing Errors for Stream Processing Workloads

2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC) Pub Date : 2018-12-01 DOI:10.1109/UCC.2018.00028

Geoffrey Phi C. Tran, J. Walters, S. Crago

{"title":"Reducing Tail Latencies while Improving Resiliency to Timing Errors for Stream Processing Workloads","authors":"Geoffrey Phi C. Tran, J. Walters, S. Crago","doi":"10.1109/UCC.2018.00028","DOIUrl":null,"url":null,"abstract":"Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in real-time data analytics services, such as processing Twitter tweets and Internet of Things (IoT) device feeds. Current stream processing frameworks boast high throughput and low average latency. However, users of these frameworks may desire lower tail latencies and better real-time performance for their applications. In practice, there are a number of errors that can affect the performance of stream processing applications, such as garbage collection and resource contention. For some applications, these errors may cause unacceptable violations of real-time constraints. In this paper we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Finally, we evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Our results show that redundant tuple processing can effectively reduce the tail latency, and that the number of missed deadlines can also be reduced by up to 94% in the best case. We also study the potential effects of duplication when applied at different stages in the topology. For the topologies in this paper, we further observe that duplication is most effective when computation is redundant at the first bolt. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results also show that computation overhead scales slower than communication, and that the real-time performance is improved in spite of the overheads. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.","PeriodicalId":288232,"journal":{"name":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in real-time data analytics services, such as processing Twitter tweets and Internet of Things (IoT) device feeds. Current stream processing frameworks boast high throughput and low average latency. However, users of these frameworks may desire lower tail latencies and better real-time performance for their applications. In practice, there are a number of errors that can affect the performance of stream processing applications, such as garbage collection and resource contention. For some applications, these errors may cause unacceptable violations of real-time constraints. In this paper we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Finally, we evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Our results show that redundant tuple processing can effectively reduce the tail latency, and that the number of missed deadlines can also be reduced by up to 94% in the best case. We also study the potential effects of duplication when applied at different stages in the topology. For the topologies in this paper, we further observe that duplication is most effective when computation is redundant at the first bolt. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results also show that computation overhead scales slower than communication, and that the real-time performance is improved in spite of the overheads. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.

查看原文本刊更多论文

减少尾部延迟，同时提高对流处理工作负载的定时错误的弹性

流处理是一种日益流行的在线数据处理模型，它可以划分为元素流。它通常用于实时数据分析服务，例如处理Twitter推文和物联网(IoT)设备提要。当前的流处理框架具有高吞吐量和低平均延迟。然而，这些框架的用户可能希望他们的应用程序具有更低的尾部延迟和更好的实时性能。在实践中，有许多错误会影响流处理应用程序的性能，例如垃圾收集和资源争用。对于某些应用程序，这些错误可能会导致对实时约束的不可接受的违反。在本文中，我们提出在数据处理管道中应用冗余来增加流处理应用程序对时序错误的弹性。这将带来更好的实时性能和尾部延迟的减少。我们提出了一种方法，并在基于Twitter Heron的框架中应用这种冗余。最后，我们使用来自英特尔Storm基准的基准测试来评估该技术针对一系列注入时间错误的有效性。我们的结果表明，冗余元组处理可以有效地减少尾部延迟，并且在最好的情况下，错过的截止日期的数量也可以减少高达94%。我们还研究了在拓扑的不同阶段应用复制时的潜在影响。对于本文的拓扑，我们进一步观察到当计算在第一个螺栓是冗余时，复制是最有效的。最后，我们评估了复制元组给流处理拓扑带来的额外开销。我们的结果还表明，计算开销的扩展速度比通信要慢，并且尽管有开销，实时性也得到了提高。总的来说，我们得出结论，通过重复元组实现的冗余确实是一个强大的工具，可以增加对间歇性运行时计时错误的弹性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)

自引率

0.00%

发文量