ByteTuning: Watermark Tuning for RoCEv2

IF 5.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Lizhuang Tan;Zhuo Jiang;Kefei Liu;Haoran Wei;Pengfei Huo;Huiling Shi;Wei Zhang;Wei Su
{"title":"ByteTuning: Watermark Tuning for RoCEv2","authors":"Lizhuang Tan;Zhuo Jiang;Kefei Liu;Haoran Wei;Pengfei Huo;Huiling Shi;Wei Zhang;Wei Su","doi":"10.1109/TCC.2025.3525496","DOIUrl":null,"url":null,"abstract":"RDMA over Converged Ethernet v2 (RoCEv2) is one of the most popular high-speed datacenter networking solutions. Watermark is the general term for various trigger and release thresholds of RoCEv2 flow control protocols, and its reasonable configuration is an important factor affecting RoCEv2 performance. In this paper, we propose ByteTuning, a centralized watermark tuning system for RoCEv2. First, three real cases of network performance degradation caused by non-optimal or improper watermark configuration are reported, and the network performance results of different watermark configurations in three typical scenarios are traversed, indicating the necessity of watermark tuning. Then, based on the RDMA Fluid model, the influence of watermark on the RoCEv2 performance is modeled and evaluated. Next, the design of the ByteTuning is introduced, which includes three mechanisms. They are 1) using simulated annealing algorithm to make the real-time watermark converge to the near-optimal configuration, 2) using network telemetry to optimize the feedback overhead, 3) compressing the search space to improve the tuning efficiency. Finally, We validate the performance of ByteTuning in multiple real datacenter networking environments, and the results show that ByteTuning outperforms existing solutions.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"303-320"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10820527/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

RDMA over Converged Ethernet v2 (RoCEv2) is one of the most popular high-speed datacenter networking solutions. Watermark is the general term for various trigger and release thresholds of RoCEv2 flow control protocols, and its reasonable configuration is an important factor affecting RoCEv2 performance. In this paper, we propose ByteTuning, a centralized watermark tuning system for RoCEv2. First, three real cases of network performance degradation caused by non-optimal or improper watermark configuration are reported, and the network performance results of different watermark configurations in three typical scenarios are traversed, indicating the necessity of watermark tuning. Then, based on the RDMA Fluid model, the influence of watermark on the RoCEv2 performance is modeled and evaluated. Next, the design of the ByteTuning is introduced, which includes three mechanisms. They are 1) using simulated annealing algorithm to make the real-time watermark converge to the near-optimal configuration, 2) using network telemetry to optimize the feedback overhead, 3) compressing the search space to improve the tuning efficiency. Finally, We validate the performance of ByteTuning in multiple real datacenter networking environments, and the results show that ByteTuning outperforms existing solutions.
ByteTuning:用于RoCEv2的水印调整
RDMA基于融合以太网v2 (RoCEv2)是最流行的高速数据中心网络解决方案之一。水印是RoCEv2流量控制协议的各种触发和释放阈值的总称,其合理配置是影响RoCEv2性能的重要因素。本文提出了一种用于RoCEv2的集中式水印调优系统ByteTuning。首先,报告了三个由于水印配置不优或不当导致网络性能下降的真实案例,并遍历了三种典型场景下不同水印配置的网络性能结果,表明了水印调优的必要性。然后,基于RDMA流体模型,对水印对RoCEv2性能的影响进行建模和评估。接下来,介绍了ByteTuning的设计,它包括三种机制。它们分别是:1)利用模拟退火算法使实时水印收敛到接近最优配置;2)利用网络遥测优化反馈开销;3)压缩搜索空间提高调优效率。最后,我们在多个真实的数据中心网络环境中验证了ByteTuning的性能,结果表明ByteTuning优于现有的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Cloud Computing
IEEE Transactions on Cloud Computing Computer Science-Software
CiteScore
9.40
自引率
6.20%
发文量
167
期刊介绍: The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信