{"title":"Revisiting RDMA Buffer Registration in the Context of Lightweight Multi-kernels","authors":"Balazs Gerofi, Masamichi Takagi, Y. Ishikawa","doi":"10.1145/2966884.2966888","DOIUrl":null,"url":null,"abstract":"Lightweight multi-kernel architectures, where HPC specialized lightweight kernels (LWKs) run side-by-side with Linux on compute nodes, have received a great deal of attention recently due to their potential for addressing many of the challenges system software faces as we move towards exascale and beyond. LWKs in multi-kernels implement only a limited set of kernel functionality and the rest is supported by Linux, for example, device drivers for high-performance interconnects. While most of the operations of modern high-performance interconnects are driven entirely by user-space, memory registration for remote direct memory access (RDMA) usually involves interaction with the Linux device driver and thus comes at the price of service offloading. In this paper we introduce various optimizations for multi-kernel LWKs to eliminate the memory registration cost. In particular, we propose a safe RDMA pre-registration mechanism combined with lazy memory unmapping in the LWK. We demonstrate up to two orders of magnitude improvement in RDMA registration latency and up to 15% improvement on MPI_Allreduce() for large message sizes.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2966884.2966888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Lightweight multi-kernel architectures, where HPC specialized lightweight kernels (LWKs) run side-by-side with Linux on compute nodes, have received a great deal of attention recently due to their potential for addressing many of the challenges system software faces as we move towards exascale and beyond. LWKs in multi-kernels implement only a limited set of kernel functionality and the rest is supported by Linux, for example, device drivers for high-performance interconnects. While most of the operations of modern high-performance interconnects are driven entirely by user-space, memory registration for remote direct memory access (RDMA) usually involves interaction with the Linux device driver and thus comes at the price of service offloading. In this paper we introduce various optimizations for multi-kernel LWKs to eliminate the memory registration cost. In particular, we propose a safe RDMA pre-registration mechanism combined with lazy memory unmapping in the LWK. We demonstrate up to two orders of magnitude improvement in RDMA registration latency and up to 15% improvement on MPI_Allreduce() for large message sizes.