Lizhuang Tan;Zhuo Jiang;Kefei Liu;Haoran Wei;Pengfei Huo;Huiling Shi;Wei Zhang;Wei Su
{"title":"ByteTuning: Watermark Tuning for RoCEv2","authors":"Lizhuang Tan;Zhuo Jiang;Kefei Liu;Haoran Wei;Pengfei Huo;Huiling Shi;Wei Zhang;Wei Su","doi":"10.1109/TCC.2025.3525496","DOIUrl":null,"url":null,"abstract":"RDMA over Converged Ethernet v2 (RoCEv2) is one of the most popular high-speed datacenter networking solutions. Watermark is the general term for various trigger and release thresholds of RoCEv2 flow control protocols, and its reasonable configuration is an important factor affecting RoCEv2 performance. In this paper, we propose ByteTuning, a centralized watermark tuning system for RoCEv2. First, three real cases of network performance degradation caused by non-optimal or improper watermark configuration are reported, and the network performance results of different watermark configurations in three typical scenarios are traversed, indicating the necessity of watermark tuning. Then, based on the RDMA Fluid model, the influence of watermark on the RoCEv2 performance is modeled and evaluated. Next, the design of the ByteTuning is introduced, which includes three mechanisms. They are 1) using simulated annealing algorithm to make the real-time watermark converge to the near-optimal configuration, 2) using network telemetry to optimize the feedback overhead, 3) compressing the search space to improve the tuning efficiency. Finally, We validate the performance of ByteTuning in multiple real datacenter networking environments, and the results show that ByteTuning outperforms existing solutions.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"303-320"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10820527/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
RDMA over Converged Ethernet v2 (RoCEv2) is one of the most popular high-speed datacenter networking solutions. Watermark is the general term for various trigger and release thresholds of RoCEv2 flow control protocols, and its reasonable configuration is an important factor affecting RoCEv2 performance. In this paper, we propose ByteTuning, a centralized watermark tuning system for RoCEv2. First, three real cases of network performance degradation caused by non-optimal or improper watermark configuration are reported, and the network performance results of different watermark configurations in three typical scenarios are traversed, indicating the necessity of watermark tuning. Then, based on the RDMA Fluid model, the influence of watermark on the RoCEv2 performance is modeled and evaluated. Next, the design of the ByteTuning is introduced, which includes three mechanisms. They are 1) using simulated annealing algorithm to make the real-time watermark converge to the near-optimal configuration, 2) using network telemetry to optimize the feedback overhead, 3) compressing the search space to improve the tuning efficiency. Finally, We validate the performance of ByteTuning in multiple real datacenter networking environments, and the results show that ByteTuning outperforms existing solutions.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.