NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545074

Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen

{"title":"NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks","authors":"Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen","doi":"10.1145/3545008.3545074","DOIUrl":null,"url":null,"abstract":"The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.

查看原文本刊更多论文

基于强化学习的数据中心网络邻居感知拥塞控制

低延迟、高吞吐量数据中心网络的挑战带来了新的流量管理问题，需要新的拥塞控制机制。一般来说，解决这个问题的建议要么集中在改进现有的基于窗口的拥塞控制上，比如在TCP中，要么集中在引入一个中央控制器来做出拥塞控制决策。在本文中，我们提出了第三种方法，其中节点与其邻居共享网络信息，并应用此信息做出限制全局拥塞的本地决策。在我们的实现中，一个节点上的速率限制决策由本地代理驱动，该代理使用强化学习来优化总体延迟、吞吐量和共享信息的组合。为了使这种方法有效，本地代理为每个节点选择总体速率限制，然后一个单独的进程在这些限制内分配各个流的流量。我们表明，在跟踪驱动的实际实现中，我们的方法比先前工作中的几种端到端和集中式机制实现了更好的拥塞避免。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量