On the suitability of MPI as a PGAS runtime

2014 21st International Conference on High Performance Computing (HiPC) Pub Date : 2014-12-01 DOI:10.1109/HiPC.2014.7116712

J. Daily, Abhinav Vishnu, B. Palmer, H. V. Dam, D. Kerbyson

{"title":"On the suitability of MPI as a PGAS runtime","authors":"J. Daily, Abhinav Vishnu, B. Palmer, H. V. Dam, D. Kerbyson","doi":"10.1109/HiPC.2014.7116712","DOIUrl":null,"url":null,"abstract":"Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes get, put, and atomic memory operations. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA, as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speedup on 1008 processors over the other MPI-based designs.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2014.7116712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes get, put, and atomic memory operations. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA, as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speedup on 1008 processors over the other MPI-based designs.

查看原文本刊更多论文

关于MPI作为PGAS运行时的适用性

分区全局地址空间(PGAS)模型正在成为MPI模型的流行替代方案，用于设计可伸缩的应用程序。同时，由于其标准化、高性能和在领先平台上的可用性，MPI仍然是一个无处不在的通信子系统。在本文中，我们探讨了使用MPI作为可扩展PGAS通信子系统的适用性。我们关注PGAS模型中的远程内存访问(RMA)通信，它通常包括get、put和原子内存操作。我们对基于MPI的设计方案进行了深入的探索。这些替代方案包括使用语义匹配的接口，如MPI- rma，以及不太直观的接口，如MPI双面，结合多线程和动态进程管理。通过对这些备选方案及其缺点的深入探讨，我们提出了一种新的设计方案，该方案由PGAS模型中的数据中心视图提供支持。该设计结合了高度调优的MPI双边语义和MPI通信器的自动、用户透明分割，以提供异步进度。我们在Exascale的通信运行时中实现异步进度排序方法和其他方法，Exascale是全局数组的通信子系统。我们的性能评估跨越了纯粹的通信基准，图社区检测和稀疏矩阵向量乘法核，以及计算化学应用。与其他基于mpi的设计相比，我们提出的基于pr的方法在1008处理器上的加速提高了2.17倍，这证明了我们提出的基于pr的方法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 21st International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量