AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2021-05-01 DOI:10.1109/IPDPS49936.2021.00100

Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, M. Guo

{"title":"AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling","authors":"Liang Zhang, Wenli Zheng, Chao Li, Yao Shen, M. Guo","doi":"10.1109/IPDPS49936.2021.00100","DOIUrl":null,"url":null,"abstract":"The complexity and variability of streaming data have brought a great challenge to the elasticity of the data processing systems. Streaming systems, such as Flink and Storm, need to adapt to the changes of workload with auto-scaling to meet the QoS requirements while saving resources. However, the accuracy of classical models (such as a queueing model) for QoS prediction decreases with the increase of the complexity and variability of streaming data and the resource interference. On the other hand, the indirect metrics used to optimize QoS may not accurately guide resource adjustment. Those problems can easily lead to waste of resources or QoS violation in practice. To solve the above problems, we propose AuTraScale, an automated and transfer learning auto-scaling solution, to determine the appropriate parallelism and resource allocation that meet the latency and throughput targets. AuTraScale uses Bayesian optimization to adapt to the complex relationship between resources and QoS, minimizing the impact of resource interference on the prediction accuracy, and a new metric that measures the performance of operators for accurate optimization. Even when the input data rate changes, it can quickly adjust the parallelism of each operator in response, with a transfer learning algorithm. We have implemented and evaluated AuTraScale on a Flink platform. The experimental results show that, compared with the state-of-the-art method like DRS and DS2, AuTraScale can reduce 66.6% and 36.7% resource consumption respectively in the scale-down and scale-up scenarios while ensuring QoS requirements, and save 13.5% resource on average when the input data rate changes.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The complexity and variability of streaming data have brought a great challenge to the elasticity of the data processing systems. Streaming systems, such as Flink and Storm, need to adapt to the changes of workload with auto-scaling to meet the QoS requirements while saving resources. However, the accuracy of classical models (such as a queueing model) for QoS prediction decreases with the increase of the complexity and variability of streaming data and the resource interference. On the other hand, the indirect metrics used to optimize QoS may not accurately guide resource adjustment. Those problems can easily lead to waste of resources or QoS violation in practice. To solve the above problems, we propose AuTraScale, an automated and transfer learning auto-scaling solution, to determine the appropriate parallelism and resource allocation that meet the latency and throughput targets. AuTraScale uses Bayesian optimization to adapt to the complex relationship between resources and QoS, minimizing the impact of resource interference on the prediction accuracy, and a new metric that measures the performance of operators for accurate optimization. Even when the input data rate changes, it can quickly adjust the parallelism of each operator in response, with a transfer learning algorithm. We have implemented and evaluated AuTraScale on a Flink platform. The experimental results show that, compared with the state-of-the-art method like DRS and DS2, AuTraScale can reduce 66.6% and 36.7% resource consumption respectively in the scale-down and scale-up scenarios while ensuring QoS requirements, and save 13.5% resource on average when the input data rate changes.

查看原文本刊更多论文

AuTraScale:流系统自动缩放的自动化和迁移学习解决方案

流数据的复杂性和可变性对数据处理系统的弹性提出了很大的挑战。Flink、Storm等流媒体系统需要通过自动伸缩来适应工作负载的变化，在满足QoS要求的同时节省资源。然而，随着流数据的复杂性和可变性以及资源干扰的增加，传统的QoS预测模型(如排队模型)的精度会降低。另一方面，用于优化QoS的间接指标可能无法准确指导资源调整。这些问题在实际应用中容易造成资源的浪费或QoS的违反。为了解决上述问题，我们提出了AuTraScale，一种自动化和迁移学习的自动缩放解决方案，以确定满足延迟和吞吐量目标的适当并行度和资源分配。AuTraScale采用贝叶斯优化来适应资源与QoS之间的复杂关系，最大限度地减少资源干扰对预测精度的影响，并采用一种新的指标来衡量运营商的性能，以实现准确的优化。即使输入数据速率发生变化，它也可以通过迁移学习算法快速调整每个算子的并行度作为响应。我们已经在Flink平台上实现并评估了AuTraScale。实验结果表明，与DRS和DS2等最先进的方法相比，在保证QoS要求的情况下，AuTraScale在按比例缩小和按比例扩大场景下的资源消耗分别减少66.6%和36.7%，在输入数据速率变化时平均节省13.5%的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量