用于云系统中自动缩放的对抗性模拟到真实转移强化学习

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-08-29 DOI:10.1109/TSE.2025.3603995

Tiangang Li;Shi Ying;Xiangbo Tian;Ting Zhang;Yong Wang

{"title":"用于云系统中自动缩放的对抗性模拟到真实转移强化学习","authors":"Tiangang Li;Shi Ying;Xiangbo Tian;Ting Zhang;Yong Wang","doi":"10.1109/TSE.2025.3603995","DOIUrl":null,"url":null,"abstract":"With the widespread adoption of cloud computing, autoscaling has become crucial for efficient resource management and stable service provision in cloud systems. In recent years, autoscaling methods based on deep reinforcement learning (DRL) have gained significant attention due to their outstanding adaptability and flexibility. However, training DRL-based autoscaler requires interactions with real cloud systems, incurring high interaction costs, low data collection efficiency, and potential operational impacts. To address these challenges, we propose ASTRA, a sim-to-real transfer reinforcement learning framework for autoscaling. ASTRA constructs a cloud system simulation environment based on a performance estimation model, enabling low-cost and high-efficiency training sample collection for policy learning. The learned policy is subsequently transferred to the real systems for scaling decisions. To address performance modeling inaccuracies caused by dynamic cloud state changes, we propose a performance modeling method based on hybrid attentive state space model. By incorporating state space model, it captures system dynamics and state evolution, effectively reducing simulation errors. Furthermore, to mitigate the performance degradation of the transferred policy due to the distribution shift, we propose an autoscaling method based on adversarial soft actor-critic. By introducing adversarial policy training with gradient regularization based on state perturbations, it significantly improves transferred policy performance. The results in the real system demonstrate that ASTRA achieves optimal overall performance in environment modeling, policy transfer and real-world autoscaling. Specifically, ASTRA outperforms all baselines in terms of instance number, response time, SLO violation rate, and CPU utilization under different workload patterns. More importantly, under limited interaction costs, ASTRA achieves a 616.94<inline-formula><tex-math>$\\times$</tex-math></inline-formula> improvement in interaction sample collection rate compared to direct online training method.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 10","pages":"2921-2941"},"PeriodicalIF":5.6000,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASTRA: Adversarial Sim-to-Real Transfer Reinforcement Learning for Autoscaling in Cloud Systems\",\"authors\":\"Tiangang Li;Shi Ying;Xiangbo Tian;Ting Zhang;Yong Wang\",\"doi\":\"10.1109/TSE.2025.3603995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the widespread adoption of cloud computing, autoscaling has become crucial for efficient resource management and stable service provision in cloud systems. In recent years, autoscaling methods based on deep reinforcement learning (DRL) have gained significant attention due to their outstanding adaptability and flexibility. However, training DRL-based autoscaler requires interactions with real cloud systems, incurring high interaction costs, low data collection efficiency, and potential operational impacts. To address these challenges, we propose ASTRA, a sim-to-real transfer reinforcement learning framework for autoscaling. ASTRA constructs a cloud system simulation environment based on a performance estimation model, enabling low-cost and high-efficiency training sample collection for policy learning. The learned policy is subsequently transferred to the real systems for scaling decisions. To address performance modeling inaccuracies caused by dynamic cloud state changes, we propose a performance modeling method based on hybrid attentive state space model. By incorporating state space model, it captures system dynamics and state evolution, effectively reducing simulation errors. Furthermore, to mitigate the performance degradation of the transferred policy due to the distribution shift, we propose an autoscaling method based on adversarial soft actor-critic. By introducing adversarial policy training with gradient regularization based on state perturbations, it significantly improves transferred policy performance. The results in the real system demonstrate that ASTRA achieves optimal overall performance in environment modeling, policy transfer and real-world autoscaling. Specifically, ASTRA outperforms all baselines in terms of instance number, response time, SLO violation rate, and CPU utilization under different workload patterns. More importantly, under limited interaction costs, ASTRA achieves a 616.94<inline-formula><tex-math>$\\\\times$</tex-math></inline-formula> improvement in interaction sample collection rate compared to direct online training method.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 10\",\"pages\":\"2921-2941\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11145190/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11145190/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

随着云计算的广泛采用，自动伸缩对于云系统中高效的资源管理和稳定的服务提供已经变得至关重要。近年来，基于深度强化学习（DRL）的自缩放方法因其突出的适应性和灵活性而受到广泛关注。然而，训练基于drl的自动缩放器需要与真实的云系统进行交互，从而产生高交互成本、低数据收集效率和潜在的操作影响。为了解决这些挑战，我们提出了ASTRA，一个用于自动缩放的模拟到真实的转移强化学习框架。ASTRA构建基于性能估计模型的云系统仿真环境，实现低成本、高效率的策略学习训练样本采集。随后将学习到的策略转移到实际系统中进行缩放决策。为了解决动态云状态变化导致的性能建模不准确问题，提出了一种基于混合关注状态空间模型的性能建模方法。通过结合状态空间模型，捕捉系统动力学和状态演化，有效降低仿真误差。此外，为了减轻由于分布转移而导致的传输策略性能下降，我们提出了一种基于对抗性软行为者批评的自缩放方法。通过引入基于状态扰动的梯度正则化对抗策略训练，显著提高了迁移策略的性能。实际系统的结果表明，ASTRA在环境建模、策略转移和现实世界的自缩放方面达到了最佳的整体性能。具体来说，ASTRA在不同工作负载模式下，在实例数、响应时间、SLO违反率和CPU利用率方面优于所有基准。更重要的是，在有限的交互成本下，ASTRA的交互样本采集率比直接在线培训方法提高了616.94美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ASTRA: Adversarial Sim-to-Real Transfer Reinforcement Learning for Autoscaling in Cloud Systems

With the widespread adoption of cloud computing, autoscaling has become crucial for efficient resource management and stable service provision in cloud systems. In recent years, autoscaling methods based on deep reinforcement learning (DRL) have gained significant attention due to their outstanding adaptability and flexibility. However, training DRL-based autoscaler requires interactions with real cloud systems, incurring high interaction costs, low data collection efficiency, and potential operational impacts. To address these challenges, we propose ASTRA, a sim-to-real transfer reinforcement learning framework for autoscaling. ASTRA constructs a cloud system simulation environment based on a performance estimation model, enabling low-cost and high-efficiency training sample collection for policy learning. The learned policy is subsequently transferred to the real systems for scaling decisions. To address performance modeling inaccuracies caused by dynamic cloud state changes, we propose a performance modeling method based on hybrid attentive state space model. By incorporating state space model, it captures system dynamics and state evolution, effectively reducing simulation errors. Furthermore, to mitigate the performance degradation of the transferred policy due to the distribution shift, we propose an autoscaling method based on adversarial soft actor-critic. By introducing adversarial policy training with gradient regularization based on state perturbations, it significantly improves transferred policy performance. The results in the real system demonstrate that ASTRA achieves optimal overall performance in environment modeling, policy transfer and real-world autoscaling. Specifically, ASTRA outperforms all baselines in terms of instance number, response time, SLO violation rate, and CPU utilization under different workload patterns. More importantly, under limited interaction costs, ASTRA achieves a 616.94

$\times$

improvement in interaction sample collection rate compared to direct online training method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.