Exploring MPI Communication Models for Graph Applications Using Graph Matching as a Case Study

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2019-05-01 DOI:10.1109/IPDPS.2019.00085

Sayan Ghosh, M. Halappanavar, A. Kalyanaraman, Arif M. Khan, A. Gebremedhin

{"title":"Exploring MPI Communication Models for Graph Applications Using Graph Matching as a Case Study","authors":"Sayan Ghosh, M. Halappanavar, A. Kalyanaraman, Arif M. Khan, A. Gebremedhin","doi":"10.1109/IPDPS.2019.00085","DOIUrl":null,"url":null,"abstract":"Traditional implementations of parallel graph operations on distributed memory platforms are written using Message Passing Interface (MPI) point-to-point communication primitives such as Send-Recv (blocking and nonblocking). Apart from this classical model, the MPI community has over the years added other communication models; however, their suitability for handling the irregular traffic workloads typical of graph operations remain comparatively less explored. Our aim in this paper is to study these relatively underutilized communication models of MPI for graph applications. More specifically, we evaluate MPI's one-sided programming, or Remote Memory Access (RMA), and nearest neighborhood collectives using a process graph topology. There are features in these newer models that are intended to better map to irregular communication patterns, as exemplified in graph algorithms. As a concrete application for our case study, we use distributed memory implementations of an approximate weighted graph matching algorithm to investigate performances of MPI-3 RMA and neighborhood collective operations compared to nonblocking Send-Recv. A matching in a graph is a subset of edges such that no two matched edges are incident on the same vertex. A maximum weight matching is a matching of maximum weight computed as the sum of the weights of matched edges. Execution of graph matching is dominated by high volume of irregular memory accesses, making it an ideal candidate for studying the effects of various MPI communication models on graph applications at scale. Our neighborhood collectives and RMA implementations yield up to 6x speedup over traditional nonblocking Send-Recv implementations on thousands of cores of the NERSC Cori supercomputer. We believe the lessons learned from this study can be adopted to benefit a wider range of graph applications.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Traditional implementations of parallel graph operations on distributed memory platforms are written using Message Passing Interface (MPI) point-to-point communication primitives such as Send-Recv (blocking and nonblocking). Apart from this classical model, the MPI community has over the years added other communication models; however, their suitability for handling the irregular traffic workloads typical of graph operations remain comparatively less explored. Our aim in this paper is to study these relatively underutilized communication models of MPI for graph applications. More specifically, we evaluate MPI's one-sided programming, or Remote Memory Access (RMA), and nearest neighborhood collectives using a process graph topology. There are features in these newer models that are intended to better map to irregular communication patterns, as exemplified in graph algorithms. As a concrete application for our case study, we use distributed memory implementations of an approximate weighted graph matching algorithm to investigate performances of MPI-3 RMA and neighborhood collective operations compared to nonblocking Send-Recv. A matching in a graph is a subset of edges such that no two matched edges are incident on the same vertex. A maximum weight matching is a matching of maximum weight computed as the sum of the weights of matched edges. Execution of graph matching is dominated by high volume of irregular memory accesses, making it an ideal candidate for studying the effects of various MPI communication models on graph applications at scale. Our neighborhood collectives and RMA implementations yield up to 6x speedup over traditional nonblocking Send-Recv implementations on thousands of cores of the NERSC Cori supercomputer. We believe the lessons learned from this study can be adopted to benefit a wider range of graph applications.

查看原文本刊更多论文

使用图匹配作为案例研究探索图应用程序的MPI通信模型

分布式内存平台上并行图操作的传统实现是使用消息传递接口(MPI)点对点通信原语(如Send-Recv(阻塞和非阻塞))编写的。除了这个经典模型之外，MPI社区多年来还增加了其他通信模型;然而，它们对于处理典型的图操作的不规则流量工作负载的适用性仍然相对较少探索。本文的目的是研究这些相对未被充分利用的图形应用程序的MPI通信模型。更具体地说，我们使用过程图拓扑来评估MPI的单面编程，或远程内存访问(RMA)，以及最近的邻域集合。这些新模型中有一些特性旨在更好地映射到不规则的通信模式，例如图算法。作为我们案例研究的具体应用，我们使用近似加权图匹配算法的分布式内存实现来研究MPI-3 RMA和邻域集体操作与非阻塞Send-Recv的性能。图中的匹配是在同一顶点上没有两条匹配的边的子集。最大权值匹配是用匹配边的权值之和计算的最大权值的匹配。图匹配的执行是由大量的不规则内存访问主导的，这使得它成为研究各种MPI通信模型对大规模图应用程序影响的理想候选者。我们的社区集体和RMA实现在NERSC Cori超级计算机的数千核上比传统的无阻塞发送-接收实现产生高达6倍的加速。我们相信，从这项研究中吸取的经验教训可以用于更广泛的图形应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量