Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI:10.1145/2802658.2802663

J. Träff, F. Lübbe, Antoine Rougier, S. Hunold

{"title":"Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations","authors":"J. Träff, F. Lübbe, Antoine Rougier, S. Hunold","doi":"10.1145/2802658.2802663","DOIUrl":null,"url":null,"abstract":"We propose a specification and discuss implementations of collective operations for parallel stencil-like computations that are not supported well by the current MPI 3.1 neighborhood collectives. In our isomorphic, sparse collectives all processes partaking in the communication operation use similar neighborhoods of processes with which to exchange data. Our interface assumes the p processes to be arranged in a d-dimensional torus (mesh) over which neighborhoods are specified per process by identical lists of relative coordinates. This extends significantly on the functionality for Cartesian communicators, and is a much lighter mechanism than distributed graph topologies. It allows for fast, local computation of communication schedules, and can be used in more dynamic contexts than current MPI functionality. We sketch three algorithms for neighborhoods with s source and target neighbors, namely a) a direct algorithm taking s communication rounds, b) a message-combining algorithm that communicates only along torus coordinates, and c) a message-combining algorithm using between [log s] and [log p] communication rounds. Our concrete interface has been implemented using the direct algorithm a). We benchmark our implementations and compare to the MPI neighborhood collectives. We demonstrate significant advantages in set-up times, and comparable communication times. Finally, we use our isomorphic, sparse collectives to implement a stencil computation with a deep halo, and discuss derived datatypes required for this application.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2802658.2802663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

We propose a specification and discuss implementations of collective operations for parallel stencil-like computations that are not supported well by the current MPI 3.1 neighborhood collectives. In our isomorphic, sparse collectives all processes partaking in the communication operation use similar neighborhoods of processes with which to exchange data. Our interface assumes the p processes to be arranged in a d-dimensional torus (mesh) over which neighborhoods are specified per process by identical lists of relative coordinates. This extends significantly on the functionality for Cartesian communicators, and is a much lighter mechanism than distributed graph topologies. It allows for fast, local computation of communication schedules, and can be used in more dynamic contexts than current MPI functionality. We sketch three algorithms for neighborhoods with s source and target neighbors, namely a) a direct algorithm taking s communication rounds, b) a message-combining algorithm that communicates only along torus coordinates, and c) a message-combining algorithm using between [log s] and [log p] communication rounds. Our concrete interface has been implemented using the direct algorithm a). We benchmark our implementations and compare to the MPI neighborhood collectives. We demonstrate significant advantages in set-up times, and comparable communication times. Finally, we use our isomorphic, sparse collectives to implement a stencil computation with a deep halo, and discuss derived datatypes required for this application.

查看原文本刊更多论文

并行模板计算的同构、稀疏类mpi集体通信操作

我们提出了一个规范，并讨论了当前MPI 3.1邻域集体不支持的并行模板式计算的集体操作的实现。在我们的同构、稀疏的集体中，所有参与通信操作的进程都使用相似的进程邻域来交换数据。我们的界面假设p个进程被安排在一个d维环面(网格)中，每个进程的邻域由相同的相对坐标列表指定。这大大扩展了笛卡尔通信器的功能，是一种比分布式图拓扑轻得多的机制。它允许通信调度的快速本地计算，并且可以在比当前MPI功能更动态的上下文中使用。针对源邻域和目标邻域分别为s个的邻域，我们提出了3种算法，即a)采用s轮通信的直接算法，b)仅沿环面坐标通信的消息组合算法，以及c)采用[log s]和[log p]轮通信的消息组合算法。我们的具体接口已经使用直接算法a)实现。我们对我们的实现进行了基准测试，并与MPI邻域集合进行了比较。我们在设置时间和类似的通信时间方面具有显着优势。最后，我们使用同构的、稀疏的集合来实现具有深晕的模板计算，并讨论该应用程序所需的派生数据类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd European MPI Users' Group Meeting

自引率

0.00%

发文量