Q-Spark: QoS Aware Micro-batch Stream Processing System Using Spark

Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park
{"title":"Q-Spark: QoS Aware Micro-batch Stream Processing System Using Spark","authors":"Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park","doi":"10.1109/ACSOS-C52956.2021.00027","DOIUrl":null,"url":null,"abstract":"Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.","PeriodicalId":268224,"journal":{"name":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSOS-C52956.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.
Q-Spark:基于Spark的QoS感知微批流处理系统
与事件驱动的流处理系统不同,微批流处理系统在处理之前收集输入数据一段时间。这是因为它们关注的是提高整个系统的吞吐量,而不是减少每个数据的延迟。然而,在微批流处理系统中,摄取连续的数据流并对其进行实时分析也是必要的,因为减少延迟比提高吞吐量更重要。本文介绍了基于Apache Spark实现的具有QoS (Quality of Service)意识的微批流处理系统Q-Spark。Q - Spa rk设计的主要思想是为每个查询设置一个截止时间,并动态调整批处理大小,以免超过该截止时间。由于Q - Spark通过尽可能多地缓冲来执行微批处理,直到超过为每个查询设置的截止日期,因此它保证了每个查询的QoS要求,同时保持了与原始Spark批处理机制一样多的吞吐量。实验结果表明,与使用触发器对数据进行一定时间缓冲的原始Spark相比,Q - Spark的尾部延迟始终被绑定到截止日期。因此,Q - Spark将每个查询的尾部延迟减少了75%,同时与没有截止日期概念的原始Spark相比,保持了稳定的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信