Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen
{"title":"基于强化学习的数据中心网络邻居感知拥塞控制","authors":"Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen","doi":"10.1145/3545008.3545074","DOIUrl":null,"url":null,"abstract":"The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks\",\"authors\":\"Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen\",\"doi\":\"10.1145/3545008.3545074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks
The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-based congestion control like in TCP or on introducing a central controller to make congestion control decisions. In this paper, we propose a third approach, where nodes share network information with their neighbors and apply this information to make local decisions that limit global congestion. In our implementation, the rate limiting decisions on one node are driven by the local agent that uses reinforcement learning to optimize a combination of overall latency, throughput and the shared information. To make this approach efficient, the local agents choose overall rate limits for each node, and then a separate process assigns the traffic of individual flows within these limits. We show that, in trace-driven real implementation, our method achieves better congestion avoidance than several end-to-end and centralized mechanisms in prior work.