Xiaoliang Wang, Hexiang Song, Cam-Tu Nguyen, Dongxu Cheng, Tiancheng Jin
{"title":"Maximizing the Benefit of RDMA at End Hosts","authors":"Xiaoliang Wang, Hexiang Song, Cam-Tu Nguyen, Dongxu Cheng, Tiancheng Jin","doi":"10.1109/INFOCOM42981.2021.9488875","DOIUrl":null,"url":null,"abstract":"RDMA is increasingly deployed in data center to meet the demands of ultra-low latency, high throughput and low CPU overhead. However, it is not easy to migrate existing applications from the TCP/IP stack to the RDMA. The developers usually need to carefully select communication primitives and manually tune the parameters for each single-purpose system. After operating the high-speed RDMA network, we identify multiple hidden costs which may cause degraded and/or unpredictable performance of RDMA-based applications. We demonstrate these hidden costs including the combination of complicated parameter settings, scalability of Reliable Connections, two-sided memory management and page alignment, resource contention among diverse traffics, etc. Furthermore, to address these problems, we introduce Nem, a suite that allows developers to maximize the benefit of RDMA by i) eliminating the resource contention at NIC cache through asynchronous resource sharing; ii) introducing hybrid page management based on messages sizes; iii) isolating flows of different traffic classes based hardware features. We implement the prototype of Nem and verify its effectiveness by rebuilding the RPC message service, which demonstrates the high throughput for large messages, low latency for small messages without compromising the low CPU utilization and good scalability performance for a large number of active connections.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
RDMA is increasingly deployed in data center to meet the demands of ultra-low latency, high throughput and low CPU overhead. However, it is not easy to migrate existing applications from the TCP/IP stack to the RDMA. The developers usually need to carefully select communication primitives and manually tune the parameters for each single-purpose system. After operating the high-speed RDMA network, we identify multiple hidden costs which may cause degraded and/or unpredictable performance of RDMA-based applications. We demonstrate these hidden costs including the combination of complicated parameter settings, scalability of Reliable Connections, two-sided memory management and page alignment, resource contention among diverse traffics, etc. Furthermore, to address these problems, we introduce Nem, a suite that allows developers to maximize the benefit of RDMA by i) eliminating the resource contention at NIC cache through asynchronous resource sharing; ii) introducing hybrid page management based on messages sizes; iii) isolating flows of different traffic classes based hardware features. We implement the prototype of Nem and verify its effectiveness by rebuilding the RPC message service, which demonstrates the high throughput for large messages, low latency for small messages without compromising the low CPU utilization and good scalability performance for a large number of active connections.