Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen
{"title":"NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks","authors":"Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen","doi":"10.1145/3545008.3545074","DOIUrl":null,"url":null,"abstract":"The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.