A deep reinforcement learning based algorithm for time and cost optimized scaling of serverless applications

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-05-05 DOI:10.1016/j.future.2025.107873

Anupama Mampage, Shanika Karunasekera, Rajkumar Buyya

{"title":"A deep reinforcement learning based algorithm for time and cost optimized scaling of serverless applications","authors":"Anupama Mampage, Shanika Karunasekera, Rajkumar Buyya","doi":"10.1016/j.future.2025.107873","DOIUrl":null,"url":null,"abstract":"<div><div>Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, or alternatively saving up to 45% in infrastructure cost for the service providers.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"173 ","pages":"Article 107873"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001682","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, or alternatively saving up to 45% in infrastructure cost for the service providers.

查看原文本刊更多论文

一种基于深度强化学习的算法，用于无服务器应用程序的时间和成本优化扩展

近年来，无服务器计算在云计算社区中获得了强大的吸引力。在这种新型计算模型的诸多优点中，用户应用程序的快速自动扩展能力最为突出。然而，在功能级别提供用户部署的特别扩展给无服务器系统带来了许多复杂性。动态创建新资源以适应功能工作负载所消耗的时间（称为冷启动延迟）导致了功能请求执行中的额外延迟和失败，这是一个非常普遍的缺点。从云提供商的角度来看，维护空闲资源池以缓解此问题通常会导致资源浪费。解决这一限制的现有解决方案主要关注于预测和理解功能负载级别，以便主动创建所需的资源。尽管这些解决方案提高了功能性能，但在做出这些扩展决策时缺乏对整体系统特征的理解，通常会导致系统资源的次优使用。此外，无服务器系统的多租户特性需要可伸缩的解决方案，可适应多个共存的应用程序，这是当前大多数解决方案的限制。在本文中，我们基于对功能和系统需求的全面理解，引入了一种新的基于多智能体深度强化学习的智能解决方案，用于功能资源的水平和垂直扩展。我们的解决方案提高了功能性能，减少了冷启动，同时也为服务提供商优化资源维护成本提供了灵活性。考虑不同工作负载场景进行的实验表明，在应用程序延迟和请求失败方面，改进最多可达23%和34%，或者为服务提供商节省多达45%的基础设施成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.