Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park
{"title":"Q-Spark: QoS Aware Micro-batch Stream Processing System Using Spark","authors":"Suyeon Lee, Yeonwoo Jeong, Minwoo Kim, Sungyong Park","doi":"10.1109/ACSOS-C52956.2021.00027","DOIUrl":null,"url":null,"abstract":"Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.","PeriodicalId":268224,"journal":{"name":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSOS-C52956.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Unlike the event-driven stream processing systems, the micro-batch stream processing systems collect input data for a certain period of time before processing. This is because they focus on improving the throughput of the entire system rather than reducing the latency of each data. However, ingesting a continuous stream of data and its real-time analysis is also necessary in micro-batch stream processing systems where reducing the latency is more important than improving the throughput. This paper presents Q-Spark, a QoS (Quality of Service) aware micro-batch stream processing system that is implemented on Apache Spark. The main idea of Q - Spa rk design is to set a deadline time for each query and dynamically adjust the batch size so as not to exceed it. Since Q - Spa r k executes a micro-batch by buffering as much as possible until the deadline set for each query is exceeded, it guarantees the QoS requirement of each query while maintaining the throughput as much as the original Spark batching mechanism. Experimental results show that the tail latency of Q - Spa rk is always bound to the deadline compared to the original Spark where data is buffered using triggers for a certain period. As a result, Q - Spa r k reduces the tail latency per query by up to 75%, while maintaining the throughput stably compared to the original Spark without the concept of a deadline.