Reducing tail latencies in micro-batch streaming workloads

Proceedings of the 2017 Symposium on Cloud Computing Pub Date : 2017-09-24 DOI:10.1145/3127479.3134433

Faria Kalim, A. Tantawi, S. Costache, A. Youssef

引用次数: 0

Abstract

Spark Streaming discretizes streams of data into micro-batches, each of which is further sub-divided into tasks and processed in parallel to improve job throughput. Previous work [2, 3] has lowered end-to-end latency in Spark Streaming. However, two causes of high tail latencies remain unaddressed: 1) data is not load-balanced across tasks, and 2) straggler tasks can increase end-to-end latency by 8 times more than the median task on a production cluster [1]. We propose a feedback-control mechanism that allows frameworks to adaptively load-balance workloads across tasks according to their processing speeds. The task runtimes are thus equalized, lowering end-to-end tail latency. Further, this reduces load on machines that have transient resource bottlenecks, thus resolving the bottlenecks and preventing them from having an enduring impact on task runtimes.

查看原文本刊更多论文

减少微批处理流工作负载的尾部延迟

Spark Streaming将数据流离散为微批，每个微批进一步细分为任务并并行处理，以提高作业吞吐量。之前的工作[2,3]已经降低了Spark Streaming的端到端延迟。然而，高尾延迟的两个原因仍然没有得到解决:1)数据在任务之间没有负载均衡，2)离散任务可能会增加端到端延迟，比生产集群上的中位数任务多8倍[1]。我们提出了一种反馈控制机制，允许框架根据任务的处理速度自适应地平衡负载。因此，任务运行时是均衡的，降低了端到端的尾部延迟。此外，这减少了具有瞬时资源瓶颈的机器上的负载，从而解决了瓶颈并防止它们对任务运行时产生持久的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 Symposium on Cloud Computing

自引率

0.00%

发文量