Q-Spark: QoS Aware Micro-batch Stream Processing System Using Spark

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C) Pub Date : 2021-09-01 DOI:10.1109/ACSOS-C52956.2021.00027

Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park

{"title":"Q-Spark: QoS Aware Micro-batch Stream Processing System Using Spark","authors":"Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park","doi":"10.1109/ACSOS-C52956.2021.00027","DOIUrl":null,"url":null,"abstract":"Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.","PeriodicalId":268224,"journal":{"name":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSOS-C52956.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.

查看原文本刊更多论文

Q-Spark:基于Spark的QoS感知微批流处理系统

与事件驱动的流处理系统不同，微批流处理系统在处理之前收集输入数据一段时间。这是因为它们关注的是提高整个系统的吞吐量，而不是减少每个数据的延迟。然而，在微批流处理系统中，摄取连续的数据流并对其进行实时分析也是必要的，因为减少延迟比提高吞吐量更重要。本文介绍了基于Apache Spark实现的具有QoS (Quality of Service)意识的微批流处理系统Q-Spark。Q - Spa rk设计的主要思想是为每个查询设置一个截止时间，并动态调整批处理大小，以免超过该截止时间。由于Q - Spark通过尽可能多地缓冲来执行微批处理，直到超过为每个查询设置的截止日期，因此它保证了每个查询的QoS要求，同时保持了与原始Spark批处理机制一样多的吞吐量。实验结果表明，与使用触发器对数据进行一定时间缓冲的原始Spark相比，Q - Spark的尾部延迟始终被绑定到截止日期。因此，Q - Spark将每个查询的尾部延迟减少了75%，同时与没有截止日期概念的原始Spark相比，保持了稳定的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)

自引率

0.00%

发文量