Congestion Control for Large-Scale RDMA Deployments

Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication Pub Date : 2015-08-17 DOI:10.1145/2785956.2787484

Yibo Zhu, Haggai Eran, D. Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, J. Padhye, S. Raindel, M. H. Yahia, Ming Zhang

{"title":"Congestion Control for Large-Scale RDMA Deployments","authors":"Yibo Zhu, Haggai Eran, D. Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, J. Padhye, S. Raindel, M. H. Yahia, Ming Zhang","doi":"10.1145/2785956.2787484","DOIUrl":null,"url":null,"abstract":"Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 protocol, which relies on Priority-based Flow Control (PFC) to enable a drop-free network. However, PFC can lead to poor application performance due to problems like head-of-line blocking and unfairness. To alleviates these problems, we introduce DCQCN, an end-to-end congestion control scheme for RoCEv2. To optimize DCQCN performance, we build a fluid model, and provide guidelines for tuning switch buffer thresholds, and other protocol parameters. Using a 3-tier Clos network testbed, we show that DCQCN dramatically improves throughput and fairness of RoCEv2 RDMA traffic. DCQCN is implemented in Mellanox NICs, and is being deployed in Microsoft's datacenters.","PeriodicalId":268472,"journal":{"name":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"455","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2785956.2787484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 455

Abstract

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 protocol, which relies on Priority-based Flow Control (PFC) to enable a drop-free network. However, PFC can lead to poor application performance due to problems like head-of-line blocking and unfairness. To alleviates these problems, we introduce DCQCN, an end-to-end congestion control scheme for RoCEv2. To optimize DCQCN performance, we build a fluid model, and provide guidelines for tuning switch buffer thresholds, and other protocol parameters. Using a 3-tier Clos network testbed, we show that DCQCN dramatically improves throughput and fairness of RoCEv2 RDMA traffic. DCQCN is implemented in Mellanox NICs, and is being deployed in Microsoft's datacenters.

查看原文本刊更多论文

大规模RDMA部署的拥塞控制

现代数据中心应用要求网络的高吞吐量(40Gbps)和超低延迟(每跳< 10 μs)，并且CPU开销低。标准的TCP/IP栈不能满足这些要求，但是远程直接内存访问(RDMA)可以。在ip路由的数据中心网络中，RDMA使用RoCEv2协议部署，该协议依赖于基于优先级的流量控制(PFC)来实现无掉落网络。然而，PFC可能会由于诸如排队阻塞和不公平等问题而导致应用程序性能下降。为了缓解这些问题，我们引入了RoCEv2的端到端拥塞控制方案DCQCN。为了优化DCQCN性能，我们构建了一个流体模型，并提供了调优开关缓冲阈值和其他协议参数的指南。通过一个3层Clos网络测试平台，我们证明DCQCN显著提高了RoCEv2 RDMA流量的吞吐量和公平性。DCQCN在Mellanox网卡中实现，并部署在微软的数据中心中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

自引率

0.00%

发文量