C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-01 DOI:10.1109/IPDPS.2019.00034

Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, D. Panda

{"title":"C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks","authors":"Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, D. Panda","doi":"10.1109/IPDPS.2019.00034","DOIUrl":null,"url":null,"abstract":"In recent years, GPU-based platforms have received significant success for parallel applications. In addition to highly optimized computation kernels on GPUs, the cost of data movement on GPU clusters plays critical roles in delivering high performance for end applications. Many recent studies have been proposed to optimize the performance of GPU-or CUDA-aware communication runtimes and these designs have been widely adopted in the emerging GPU-based applications. These studies mainly focus on improving the communication performance on native environments, i.e., physical machines, however GPU-based communication schemes on cloud environments are not well studied yet. This paper first investigates the performance characteristics of state-of-the-art GPU-based communication schemes on both native and container-based environments, which show a significant demand to design high-performance container-aware communication schemes in GPU-enabled runtimes to deliver near-native performance for end applications on clouds. Next, we propose the C-GDR approach to design high-performance Container-aware GPUDirect communication schemes on RDMA networks. C-GDR allows communication runtimes to successfully detect process locality, GPU residency, NUMA, architecture information, and communication pattern to enable intelligent and dynamic selection of the best communication and data movement schemes on GPU-enabled clouds. We have integrated C-GDR with the MVAPICH2 library. Our evaluations show that MVAPICH2 with C-GDR has clear performance benefits on container-based cloud environments, compared to default MVAPICH2-GDR and Open MPI. For instance, our proposed C-GDR can outperform default MVAPICH2-GDR schemes by up to 66% on micro-benchmarks and up to 26% on HPC applications over a container-based environment.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In recent years, GPU-based platforms have received significant success for parallel applications. In addition to highly optimized computation kernels on GPUs, the cost of data movement on GPU clusters plays critical roles in delivering high performance for end applications. Many recent studies have been proposed to optimize the performance of GPU-or CUDA-aware communication runtimes and these designs have been widely adopted in the emerging GPU-based applications. These studies mainly focus on improving the communication performance on native environments, i.e., physical machines, however GPU-based communication schemes on cloud environments are not well studied yet. This paper first investigates the performance characteristics of state-of-the-art GPU-based communication schemes on both native and container-based environments, which show a significant demand to design high-performance container-aware communication schemes in GPU-enabled runtimes to deliver near-native performance for end applications on clouds. Next, we propose the C-GDR approach to design high-performance Container-aware GPUDirect communication schemes on RDMA networks. C-GDR allows communication runtimes to successfully detect process locality, GPU residency, NUMA, architecture information, and communication pattern to enable intelligent and dynamic selection of the best communication and data movement schemes on GPU-enabled clouds. We have integrated C-GDR with the MVAPICH2 library. Our evaluations show that MVAPICH2 with C-GDR has clear performance benefits on container-based cloud environments, compared to default MVAPICH2-GDR and Open MPI. For instance, our proposed C-GDR can outperform default MVAPICH2-GDR schemes by up to 66% on micro-benchmarks and up to 26% on HPC applications over a container-based environment.

查看原文本刊更多论文

C-GDR:基于RDMA网络的高性能容器感知GPUDirect MPI通信方案

近年来，基于gpu的平台在并行应用方面取得了重大成功。除了GPU上高度优化的计算内核外，GPU集群上的数据移动成本在为最终应用程序提供高性能方面起着至关重要的作用。最近提出了许多优化gpu或cuda感知通信运行时性能的研究，这些设计已被广泛采用于新兴的基于gpu的应用中。这些研究主要集中在提高原生环境(即物理机)上的通信性能，而基于gpu的云环境下的通信方案还没有很好的研究。本文首先研究了最先进的基于gpu的通信方案在本地和基于容器的环境中的性能特征，这表明了在支持gpu的运行时中设计高性能容器感知通信方案的重要需求，以便为云上的终端应用程序提供接近本地的性能。接下来，我们提出了在RDMA网络上设计高性能容器感知GPUDirect通信方案的C-GDR方法。C-GDR允许通信运行时成功地检测进程位置、GPU驻留、NUMA、架构信息和通信模式，从而在支持GPU的云上智能和动态地选择最佳通信和数据移动方案。我们将C-GDR与MVAPICH2库集成在一起。我们的评估表明，与默认的MVAPICH2- gdr和Open MPI相比，带有C-GDR的MVAPICH2在基于容器的云环境中具有明显的性能优势。例如，我们提出的C-GDR在微基准测试上的性能比默认的MVAPICH2-GDR方案高出66%，在基于容器的HPC应用程序上的性能高出26%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量