Kai Peng;Yi Hu;Haonan Ding;Haoxuan Chen;Liangyuan Wang;Chao Cai;Menglan Hu
{"title":"Large-Scale Service Mesh Orchestration With Probabilistic Routing in Cloud Data Centers","authors":"Kai Peng;Yi Hu;Haonan Ding;Haoxuan Chen;Liangyuan Wang;Chao Cai;Menglan Hu","doi":"10.1109/TSC.2025.3526373","DOIUrl":null,"url":null,"abstract":"Service mesh architectures are emerging as a promising microservice paradigm for developing online cloud applications. However, in large-scale microservice scenarios, frequent service communications, intricate call dependencies, and stringent latency requirements bring great pressure to efficient service mesh orchestration. In this case, the problems of service deployment and request routing based on service mesh architectures are tightly-coupled and interdependent, and cannot be effectively optimized individually, enlarging the difficulty for collaborative orchestration. When microservice multiplexing, parallel dependencies, and multi-instance modeling are considered, the difficulty is further aggravated. Nonetheless, most existing work failed to propose appropriate models and methods for the above challenges. Therefore, this article studies the large-scale service mesh orchestration with probabilistic routing and constrained bandwidths for parallel call graphs. We leverage the open Jackson queuing network theory to capture crucial microservices and analyze request processing, queuing, and communication latency for massive user requests in a fine-grained way. Then, this article proposes an efficient three-stage heuristic, which achieves elegant multi-instance consolidation and probabilistic multi-queue routing to reduce response latency and cost. We also provide the algorithm complexity and mathematical analysis of the performance. Finally, extensive trace-driven experiments are performed to validate the superiority of our proposed algorithm over other baselines.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 2","pages":"868-882"},"PeriodicalIF":5.5000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10847942/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Service mesh architectures are emerging as a promising microservice paradigm for developing online cloud applications. However, in large-scale microservice scenarios, frequent service communications, intricate call dependencies, and stringent latency requirements bring great pressure to efficient service mesh orchestration. In this case, the problems of service deployment and request routing based on service mesh architectures are tightly-coupled and interdependent, and cannot be effectively optimized individually, enlarging the difficulty for collaborative orchestration. When microservice multiplexing, parallel dependencies, and multi-instance modeling are considered, the difficulty is further aggravated. Nonetheless, most existing work failed to propose appropriate models and methods for the above challenges. Therefore, this article studies the large-scale service mesh orchestration with probabilistic routing and constrained bandwidths for parallel call graphs. We leverage the open Jackson queuing network theory to capture crucial microservices and analyze request processing, queuing, and communication latency for massive user requests in a fine-grained way. Then, this article proposes an efficient three-stage heuristic, which achieves elegant multi-instance consolidation and probabilistic multi-queue routing to reduce response latency and cost. We also provide the algorithm complexity and mathematical analysis of the performance. Finally, extensive trace-driven experiments are performed to validate the superiority of our proposed algorithm over other baselines.
期刊介绍:
IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.