{"title":"RoUD: Scalable RDMA over UD in Lossy Data Center Networks","authors":"Zhiqiang He, Yuxin Chen, Bei Hua","doi":"10.1109/CCGrid57682.2023.00014","DOIUrl":null,"url":null,"abstract":"Remote direct memory access (RDMA) has been widely deployed in data centers due to the lower latency and higher throughput of the kernel TCP/IP stack. However, RDMA still faces a scalability problem including connection scalability and network scalability issues. In this paper, we present RoUD, a userspace network stack that leverages the unreliable datagram (UD) transport mode of RDMA to improve connection scalability. RoUD also eliminates the dependency on PFC in data center networks, thereby enhancing network scalability. RoUdimplements three performance optimizations in the userspace network stack and introduces two types of flow control to avoid packet loss on the host from happening on the host for high performance. We built a prototype of RoUD based on the standard InfiniBand Verbs library. The evaluation results on a testbed with 100 Gbps RNICs show that in the case of large-scale connections its throughput is 1.4× better than the widely used reliable connection (RC) transport.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Remote direct memory access (RDMA) has been widely deployed in data centers due to the lower latency and higher throughput of the kernel TCP/IP stack. However, RDMA still faces a scalability problem including connection scalability and network scalability issues. In this paper, we present RoUD, a userspace network stack that leverages the unreliable datagram (UD) transport mode of RDMA to improve connection scalability. RoUD also eliminates the dependency on PFC in data center networks, thereby enhancing network scalability. RoUdimplements three performance optimizations in the userspace network stack and introduces two types of flow control to avoid packet loss on the host from happening on the host for high performance. We built a prototype of RoUD based on the standard InfiniBand Verbs library. The evaluation results on a testbed with 100 Gbps RNICs show that in the case of large-scale connections its throughput is 1.4× better than the widely used reliable connection (RC) transport.