动态满足服务网格性能目标的框架

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management Pub Date : 2024-07-26 DOI:10.1109/TNSM.2024.3434328

Forough Shahab Samani;Rolf Stadler

{"title":"动态满足服务网格性能目标的框架","authors":"Forough Shahab Samani;Rolf Stadler","doi":"10.1109/TNSM.2024.3434328","DOIUrl":null,"url":null,"abstract":"We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning (RL) techniques to train an agent that periodically performs control actions to reallocate resources. We develop and evaluate the framework using a laboratory testbed where we run information and computing services on a service mesh, supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, cost-related objectives, and service differentiation. Our framework supports the design of a control agent for a given management objective. The management objective is defined first and then mapped onto available control actions. Several types of control actions can be executed simultaneously, which allows for efficient resource utilization. Second, the framework separates the learning of the system model and the operating region from the learning of the control policy. By first learning the system model and the operating region from testbed traces, we can instantiate a simulator and train the agent for different management objectives. Third, the use of a simulator shortens the training time by orders of magnitude compared with training the agent on the testbed. We evaluate the learned policies on the testbed and show the effectiveness of our approach in several scenarios. In one scenario, we design a controller that achieves the management objectives with 50% less system resources than Kubernetes HPA autoscaling.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"5992-6007"},"PeriodicalIF":4.7000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10612769","citationCount":"0","resultStr":"{\"title\":\"A Framework for Dynamically Meeting Performance Objectives on a Service Mesh\",\"authors\":\"Forough Shahab Samani;Rolf Stadler\",\"doi\":\"10.1109/TNSM.2024.3434328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning (RL) techniques to train an agent that periodically performs control actions to reallocate resources. We develop and evaluate the framework using a laboratory testbed where we run information and computing services on a service mesh, supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, cost-related objectives, and service differentiation. Our framework supports the design of a control agent for a given management objective. The management objective is defined first and then mapped onto available control actions. Several types of control actions can be executed simultaneously, which allows for efficient resource utilization. Second, the framework separates the learning of the system model and the operating region from the learning of the control policy. By first learning the system model and the operating region from testbed traces, we can instantiate a simulator and train the agent for different management objectives. Third, the use of a simulator shortens the training time by orders of magnitude compared with training the agent on the testbed. We evaluate the learned policies on the testbed and show the effectiveness of our approach in several scenarios. In one scenario, we design a controller that achieves the management objectives with 50% less system resources than Kubernetes HPA autoscaling.\",\"PeriodicalId\":13423,\"journal\":{\"name\":\"IEEE Transactions on Network and Service Management\",\"volume\":\"21 6\",\"pages\":\"5992-6007\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10612769\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Network and Service Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10612769/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10612769/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一个框架，用于实现在服务网格上并发执行的多个服务的端到端管理目标。我们应用强化学习（RL）技术来训练一个智能体，它定期执行控制动作来重新分配资源。我们使用实验室测试平台来开发和评估框架，在Istio和Kubernetes平台的支持下，在服务网格上运行信息和计算服务。我们研究了不同的管理目标，包括服务请求的端到端延迟界限、吞吐量目标、与成本相关的目标和服务差异化。我们的框架支持为给定的管理目标设计控制代理。首先定义管理目标，然后将其映射到可用的控制操作上。可以同时执行几种类型的控制操作，从而实现有效的资源利用。其次，该框架将系统模型和操作区域的学习与控制策略的学习分离开来。通过首先从试验台轨迹中学习系统模型和操作区域，我们可以实例化模拟器并针对不同的管理目标训练代理。第三，与在测试台上训练智能体相比，使用模拟器的训练时间缩短了几个数量级。我们在测试台上评估了学习到的策略，并在几个场景中展示了我们的方法的有效性。在一个场景中，我们设计了一个控制器，它比Kubernetes HPA自动伸缩少50%的系统资源来实现管理目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Framework for Dynamically Meeting Performance Objectives on a Service Mesh

We present a framework for achieving end-to-end management objectives for multiple services that concurrently execute on a service mesh. We apply reinforcement learning (RL) techniques to train an agent that periodically performs control actions to reallocate resources. We develop and evaluate the framework using a laboratory testbed where we run information and computing services on a service mesh, supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, cost-related objectives, and service differentiation. Our framework supports the design of a control agent for a given management objective. The management objective is defined first and then mapped onto available control actions. Several types of control actions can be executed simultaneously, which allows for efficient resource utilization. Second, the framework separates the learning of the system model and the operating region from the learning of the control policy. By first learning the system model and the operating region from testbed traces, we can instantiate a simulator and train the agent for different management objectives. Third, the use of a simulator shortens the training time by orders of magnitude compared with training the agent on the testbed. We evaluate the learned policies on the testbed and show the effectiveness of our approach in several scenarios. In one scenario, we design a controller that achieves the management objectives with 50% less system resources than Kubernetes HPA autoscaling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

15.10%

发文量

325

期刊介绍： IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.