基于分层自适应学习的低训练开销数据中心网络拥塞控制

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management Pub Date : 2025-07-16 DOI:10.1109/TNSM.2025.3589637

Jinbin Hu;Zikai Zhou;Jing Wang

{"title":"基于分层自适应学习的低训练开销数据中心网络拥塞控制","authors":"Jinbin Hu;Zikai Zhou;Jing Wang","doi":"10.1109/TNSM.2025.3589637","DOIUrl":null,"url":null,"abstract":"Most congestion control mechanisms perform well in specific datacenter networks, but none can consistently deliver good performance across varying scenarios. Recently proposed frameworks based on reinforcement learning can flexibly select congestion control algorithms to adapt to dynamic network. However, frequently altering the congestion control mechanisms during relatively stable periods of the network actually leads to instability and unnecessary computational overhead. In this paper, we propose a lightweight and hierarchical adaptive congestion control algorithm (LACC) to be resilient to the varying network. LACC dynamically selects the appropriate congestion control mechanism only when the current congestion control algorithm is not suitable for the current network state, rather than changing the congestion control scheme every training cycle to ensure network stability. The simulation results show that LACC significantly reduces the average overhead by 31% and improves throughput by up to 47%, 35%, 23% and 15% compared to Cubic, Reno, BBR and Antelope, respectively.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 5","pages":"4061-4069"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical Adaptive Learning-Based Congestion Control With Low Training Overhead for Datacenter Networks\",\"authors\":\"Jinbin Hu;Zikai Zhou;Jing Wang\",\"doi\":\"10.1109/TNSM.2025.3589637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most congestion control mechanisms perform well in specific datacenter networks, but none can consistently deliver good performance across varying scenarios. Recently proposed frameworks based on reinforcement learning can flexibly select congestion control algorithms to adapt to dynamic network. However, frequently altering the congestion control mechanisms during relatively stable periods of the network actually leads to instability and unnecessary computational overhead. In this paper, we propose a lightweight and hierarchical adaptive congestion control algorithm (LACC) to be resilient to the varying network. LACC dynamically selects the appropriate congestion control mechanism only when the current congestion control algorithm is not suitable for the current network state, rather than changing the congestion control scheme every training cycle to ensure network stability. The simulation results show that LACC significantly reduces the average overhead by 31% and improves throughput by up to 47%, 35%, 23% and 15% compared to Cubic, Reno, BBR and Antelope, respectively.\",\"PeriodicalId\":13423,\"journal\":{\"name\":\"IEEE Transactions on Network and Service Management\",\"volume\":\"22 5\",\"pages\":\"4061-4069\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Network and Service Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11082387/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11082387/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

大多数拥塞控制机制在特定的数据中心网络中表现良好，但没有一种机制能够在不同的场景中始终如一地提供良好的性能。最近提出的基于强化学习的框架可以灵活地选择拥塞控制算法以适应动态网络。然而，在网络相对稳定的时期频繁改变拥塞控制机制实际上会导致不稳定和不必要的计算开销。在本文中，我们提出了一种轻量级的分层自适应拥塞控制算法（LACC），以适应不断变化的网络。LACC仅在当前拥塞控制算法不适合当前网络状态时才动态选择合适的拥塞控制机制，而不是在每个训练周期都更改拥塞控制方案以保证网络的稳定性。仿真结果表明，与Cubic、Reno、BBR和Antelope相比，LACC的平均开销降低了31%，吞吐量分别提高了47%、35%、23%和15%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hierarchical Adaptive Learning-Based Congestion Control With Low Training Overhead for Datacenter Networks

Most congestion control mechanisms perform well in specific datacenter networks, but none can consistently deliver good performance across varying scenarios. Recently proposed frameworks based on reinforcement learning can flexibly select congestion control algorithms to adapt to dynamic network. However, frequently altering the congestion control mechanisms during relatively stable periods of the network actually leads to instability and unnecessary computational overhead. In this paper, we propose a lightweight and hierarchical adaptive congestion control algorithm (LACC) to be resilient to the varying network. LACC dynamically selects the appropriate congestion control mechanism only when the current congestion control algorithm is not suitable for the current network state, rather than changing the congestion control scheme every training cycle to ensure network stability. The simulation results show that LACC significantly reduces the average overhead by 31% and improves throughput by up to 47%, 35%, 23% and 15% compared to Cubic, Reno, BBR and Antelope, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

15.10%

发文量

325

期刊介绍： IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.