{"title":"Cloud Load Balancers Need to Stay Off the Data Path","authors":"Yuchen Zhang;Shuai Jin;Zhenyu Wen;Shibo He;Qingzheng Hou;Yang Song;Zhigang Zong;Xiaomin Wu;Bengbeng Xue;Chenghao Sun;Ku Li;Xing Li;Biao Lyu;Rong Wen;Jiming Chen;Shunmin Zhu","doi":"10.1109/TCC.2025.3595172","DOIUrl":null,"url":null,"abstract":"Load balancers (LBs) are crucial in cloud environments, ensuring workload scalability. They route packets destined for a service (identified by a virtual IP address, or VIP) to a group of servers designated to deliver that service, each with its direct IP address (DIP). Consequently, LBs significantly impact the performance of cloud services and the experience of tenants. Many academic studies focus on specific issues such as designing new load balancing algorithms and developing hardware load balancing devices to enhance the LB’s performance, reliability, and scalability. However, we believe this approach is not ideal for cloud data centers for the following reasons: (i) the increasing demands of users and the variety of cloud service types turn the LB into a bottleneck; and (ii) continually adding machines or upgrading hardware devices can incur substantial costs. In this paper, we propose the Next Generation Load Balancer (NGLB), designed to bypass the TCP connection datapath from the LB, thereby eliminating latency overheads and scalability bottlenecks of traditional cloud LBs. The LB only participates in the TCP connection establishment phase. The three key features of our design are: (i) the introduction of an <italic>active address learning</i> model to redirect traffic and bypass the LB, (ii) a <italic>multi-tenant isolation</i> mechanism for deployment within multi-tenant Virtual Private Cloud networks, and (iii) a distributed flow control method, known as <italic>hierarchical connection cleaner</i>, designed to ensure the availability of backend resources. The evaluation results demonstrate that NGLB reduces latency by 16% and increases nearly 3× throughput. With the same LB resources, NGLB improves 10× rate of new connection establishment. More importantly, five years of operational experience has proven NGLB’s stability for high-bandwidth services.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 3","pages":"1078-1090"},"PeriodicalIF":5.0000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11108264/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Load balancers (LBs) are crucial in cloud environments, ensuring workload scalability. They route packets destined for a service (identified by a virtual IP address, or VIP) to a group of servers designated to deliver that service, each with its direct IP address (DIP). Consequently, LBs significantly impact the performance of cloud services and the experience of tenants. Many academic studies focus on specific issues such as designing new load balancing algorithms and developing hardware load balancing devices to enhance the LB’s performance, reliability, and scalability. However, we believe this approach is not ideal for cloud data centers for the following reasons: (i) the increasing demands of users and the variety of cloud service types turn the LB into a bottleneck; and (ii) continually adding machines or upgrading hardware devices can incur substantial costs. In this paper, we propose the Next Generation Load Balancer (NGLB), designed to bypass the TCP connection datapath from the LB, thereby eliminating latency overheads and scalability bottlenecks of traditional cloud LBs. The LB only participates in the TCP connection establishment phase. The three key features of our design are: (i) the introduction of an active address learning model to redirect traffic and bypass the LB, (ii) a multi-tenant isolation mechanism for deployment within multi-tenant Virtual Private Cloud networks, and (iii) a distributed flow control method, known as hierarchical connection cleaner, designed to ensure the availability of backend resources. The evaluation results demonstrate that NGLB reduces latency by 16% and increases nearly 3× throughput. With the same LB resources, NGLB improves 10× rate of new connection establishment. More importantly, five years of operational experience has proven NGLB’s stability for high-bandwidth services.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.