Gwydion: Efficient auto-scaling for complex containerized applications in Kubernetes through Reinforcement Learning

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Network and Computer Applications Pub Date : 2024-11-26 DOI:10.1016/j.jnca.2024.104067

José Santos , Efstratios Reppas , Tim Wauters , Bruno Volckaert , Filip De Turck

{"title":"Gwydion: Efficient auto-scaling for complex containerized applications in Kubernetes through Reinforcement Learning","authors":"José Santos , Efstratios Reppas , Tim Wauters , Bruno Volckaert , Filip De Turck","doi":"10.1016/j.jnca.2024.104067","DOIUrl":null,"url":null,"abstract":"<div><div>Containers have reshaped application deployment and life-cycle management in recent cloud platforms. The paradigm shift from large monolithic applications to complex graphs of loosely-coupled microservices aims to increase deployment flexibility and operational efficiency. However, efficient allocation and scaling of microservice applications is challenging due to their intricate inter-dependencies. Existing works do not consider microservice dependencies, which could lead to the application’s performance degradation when service demand increases. As dependencies increase, communication between microservices becomes more complex and frequent, leading to slower response times and higher resource consumption, especially during high demand. In addition, performance issues in one microservice can also trigger a ripple effect across dependent services, exacerbating the performance degradation across the entire application. This paper studies the impact of microservice inter-dependencies in auto-scaling by proposing <em>Gwydion</em>, a novel framework that enables different auto-scaling goals through Reinforcement Learning (RL) algorithms. <em>Gwydion</em> has been developed based on the OpenAI Gym library and customized for the popular Kubernetes (K8s) platform to bridge the gap between RL and auto-scaling research by training RL algorithms on real cloud environments for two opposing reward strategies: cost-aware and latency-aware. <em>Gwydion</em> focuses on improving resource usage and reducing the application’s response time by considering microservice inter-dependencies when scaling horizontally. Experiments with microservice benchmark applications, such as Redis Cluster (RC) and Online Boutique (OB), show that RL agents can reduce deployment costs and the application’s response time compared to default scaling mechanisms, achieving up to 50% lower latency while avoiding performance degradation. For RC, cost-aware algorithms can reduce the number of deployed pods (2 to 4), resulting in slightly higher latency (<span><math><mrow><mn>300</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> to 6 ms) but lower resource consumption. For OB, all RL algorithms exhibit a notable response time improvement by considering all microservices in the observation space, enabling the sequential triggering of actions across different deployments. This leads to nearly 30% cost savings while maintaining consistently lower latency throughout the experiment. Gwydion aims to advance auto-scaling research in a rapidly evolving dynamic cloud environment.</div></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"234 ","pages":"Article 104067"},"PeriodicalIF":7.7000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804524002443","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Containers have reshaped application deployment and life-cycle management in recent cloud platforms. The paradigm shift from large monolithic applications to complex graphs of loosely-coupled microservices aims to increase deployment flexibility and operational efficiency. However, efficient allocation and scaling of microservice applications is challenging due to their intricate inter-dependencies. Existing works do not consider microservice dependencies, which could lead to the application’s performance degradation when service demand increases. As dependencies increase, communication between microservices becomes more complex and frequent, leading to slower response times and higher resource consumption, especially during high demand. In addition, performance issues in one microservice can also trigger a ripple effect across dependent services, exacerbating the performance degradation across the entire application. This paper studies the impact of microservice inter-dependencies in auto-scaling by proposing Gwydion, a novel framework that enables different auto-scaling goals through Reinforcement Learning (RL) algorithms. Gwydion has been developed based on the OpenAI Gym library and customized for the popular Kubernetes (K8s) platform to bridge the gap between RL and auto-scaling research by training RL algorithms on real cloud environments for two opposing reward strategies: cost-aware and latency-aware. Gwydion focuses on improving resource usage and reducing the application’s response time by considering microservice inter-dependencies when scaling horizontally. Experiments with microservice benchmark applications, such as Redis Cluster (RC) and Online Boutique (OB), show that RL agents can reduce deployment costs and the application’s response time compared to default scaling mechanisms, achieving up to 50% lower latency while avoiding performance degradation. For RC, cost-aware algorithms can reduce the number of deployed pods (2 to 4), resulting in slightly higher latency (

300 μ s

to 6 ms) but lower resource consumption. For OB, all RL algorithms exhibit a notable response time improvement by considering all microservices in the observation space, enabling the sequential triggering of actions across different deployments. This leads to nearly 30% cost savings while maintaining consistently lower latency throughout the experiment. Gwydion aims to advance auto-scaling research in a rapidly evolving dynamic cloud environment.

查看原文本刊更多论文

Gwydion：通过强化学习为Kubernetes中的复杂容器化应用程序提供高效的自动扩展

在最近的云平台中，容器重塑了应用程序部署和生命周期管理。从大型单片应用程序到松散耦合微服务的复杂图的范式转变旨在提高部署灵活性和操作效率。然而，由于微服务应用程序之间错综复杂的相互依赖关系，有效的分配和扩展是具有挑战性的。现有的工作没有考虑微服务依赖，当服务需求增加时，这可能导致应用程序的性能下降。随着依赖关系的增加，微服务之间的通信变得更加复杂和频繁，从而导致更慢的响应时间和更高的资源消耗，特别是在高需求期间。此外，一个微服务中的性能问题还可能引发跨依赖服务的连锁反应，从而加剧整个应用程序的性能下降。本文通过提出Gwydion来研究微服务相互依赖对自动扩展的影响，Gwydion是一个通过强化学习（RL）算法实现不同自动扩展目标的新框架。Gwydion是基于OpenAI Gym库开发的，并为流行的Kubernetes （K8s）平台定制的，通过在真实的云环境中训练RL算法，为两种相反的奖励策略（成本感知和延迟感知）架起了RL和自动扩展研究之间的桥梁。在横向扩展时，Gwydion通过考虑微服务的相互依赖，专注于改善资源的使用，减少应用程序的响应时间。对微服务基准应用程序（如Redis Cluster （RC）和Online Boutique (OB)）的实验表明，与默认扩展机制相比，RL代理可以降低部署成本和应用程序的响应时间，在避免性能下降的同时实现高达50%的延迟降低。对于RC，成本感知算法可以减少部署的pod数量（2到4），从而导致稍高的延迟（300μs到6 ms），但降低资源消耗。对于OB，所有RL算法通过考虑观察空间中的所有微服务，支持跨不同部署的顺序触发操作，显示出显著的响应时间改进。这可以节省近30%的成本，同时在整个实验过程中始终保持较低的延迟。Gwydion旨在在快速发展的动态云环境中推进自动伸缩研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Network and Computer Applications 工程技术-计算机：跨学科应用

CiteScore

21.50

自引率

3.40%

发文量

142

审稿时长

37 days

期刊介绍： The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.