Pengjuan Liang, Yaling Xun, Jianghui Cai, Haifeng Yang
{"title":"基于密集连接时空GNN和q -学习的微服务资源自动伸缩","authors":"Pengjuan Liang, Yaling Xun, Jianghui Cai, Haifeng Yang","doi":"10.1016/j.future.2025.107909","DOIUrl":null,"url":null,"abstract":"<div><div>Autoscaling technology enables cloud-native systems to adapt to dynamic workload changes by scaling outward or inward without manual intervention. However, when facing sudden and unpredictable workloads, it becomes particularly difficult to determine which services need to be scaled and to assess the amount of resources required, especially for complex time-varying service dependencies that are difficult to accurately quantify. To adaptively and accurately evaluate the resource requirements of different services under dynamic workloads and minimize costs under the constraints of service level agreements (SLAs), a microservice resource autoscaling solution (AGQ) that combines a Spatio-temporal Graph Neural Network (STGNN) based on dense connections with Q-learning is proposed. AGQ models interdependent microservices as a graph structure, integrating real-time monitored resource status data into feature vectors for each node. By introducing the dense connection-based STGNN model, it enhances the ability to capture feature information and facilitates gradient propagation. Then, the dense connection-based STGNN model was introduced to enhance its ability to capture feature information and gradient propagation, for more accurately predicting future resource usage. Finally, reinforcement learning Q-learning is adopted to effectively evaluate scheduling strategies and optimize resource allocation by simultaneously relying on historical experience and the predictions from the STGNN model. The experimental results show that the collaborative optimization strategy AGQ can better adapt to changes in service dependency relationships, more accurately manage resources. AGQ achieve superior cost efficiency and lower SLA violation rate compared to several advanced methods.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107909"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autoscaling of microservice resources based on dense connectivity spatio-temporal GNN and Q-learning\",\"authors\":\"Pengjuan Liang, Yaling Xun, Jianghui Cai, Haifeng Yang\",\"doi\":\"10.1016/j.future.2025.107909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Autoscaling technology enables cloud-native systems to adapt to dynamic workload changes by scaling outward or inward without manual intervention. However, when facing sudden and unpredictable workloads, it becomes particularly difficult to determine which services need to be scaled and to assess the amount of resources required, especially for complex time-varying service dependencies that are difficult to accurately quantify. To adaptively and accurately evaluate the resource requirements of different services under dynamic workloads and minimize costs under the constraints of service level agreements (SLAs), a microservice resource autoscaling solution (AGQ) that combines a Spatio-temporal Graph Neural Network (STGNN) based on dense connections with Q-learning is proposed. AGQ models interdependent microservices as a graph structure, integrating real-time monitored resource status data into feature vectors for each node. By introducing the dense connection-based STGNN model, it enhances the ability to capture feature information and facilitates gradient propagation. Then, the dense connection-based STGNN model was introduced to enhance its ability to capture feature information and gradient propagation, for more accurately predicting future resource usage. Finally, reinforcement learning Q-learning is adopted to effectively evaluate scheduling strategies and optimize resource allocation by simultaneously relying on historical experience and the predictions from the STGNN model. The experimental results show that the collaborative optimization strategy AGQ can better adapt to changes in service dependency relationships, more accurately manage resources. AGQ achieve superior cost efficiency and lower SLA violation rate compared to several advanced methods.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"174 \",\"pages\":\"Article 107909\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25002043\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002043","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Autoscaling of microservice resources based on dense connectivity spatio-temporal GNN and Q-learning
Autoscaling technology enables cloud-native systems to adapt to dynamic workload changes by scaling outward or inward without manual intervention. However, when facing sudden and unpredictable workloads, it becomes particularly difficult to determine which services need to be scaled and to assess the amount of resources required, especially for complex time-varying service dependencies that are difficult to accurately quantify. To adaptively and accurately evaluate the resource requirements of different services under dynamic workloads and minimize costs under the constraints of service level agreements (SLAs), a microservice resource autoscaling solution (AGQ) that combines a Spatio-temporal Graph Neural Network (STGNN) based on dense connections with Q-learning is proposed. AGQ models interdependent microservices as a graph structure, integrating real-time monitored resource status data into feature vectors for each node. By introducing the dense connection-based STGNN model, it enhances the ability to capture feature information and facilitates gradient propagation. Then, the dense connection-based STGNN model was introduced to enhance its ability to capture feature information and gradient propagation, for more accurately predicting future resource usage. Finally, reinforcement learning Q-learning is adopted to effectively evaluate scheduling strategies and optimize resource allocation by simultaneously relying on historical experience and the predictions from the STGNN model. The experimental results show that the collaborative optimization strategy AGQ can better adapt to changes in service dependency relationships, more accurately manage resources. AGQ achieve superior cost efficiency and lower SLA violation rate compared to several advanced methods.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.