Partitioned Collective Communication

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI:10.1109/ExaMPI54564.2021.00007

Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer

{"title":"Partitioned Collective Communication","authors":"Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer","doi":"10.1109/ExaMPI54564.2021.00007","DOIUrl":null,"url":null,"abstract":"Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Workshop on Exascale MPI (ExaMPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ExaMPI54564.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.

查看原文本刊更多论文

分割的集体沟通

分区的点对点通信和持久的集体通信最近都在MPI-4.0中标准化。当计划传输在MPI应用程序中可行时，与基于MPI-3.1的通信相比，每种通信都具有性能和可伸缩性优势。将它们合并为与分区的通用的、持久的集体通信是合乎逻辑的下一步，这在性能可移植性方面具有显著的优势。需要处理关于这些操作的语法和语义的重要决策，包括通信器组成员对分区选择的知识范围。本文介绍并提出了用于分区集体通信的接口方案。对于多线程、加速器卸载和/或硬件集体增强的MPI实现驱动合适的应用程序，以及每个MPI进程具有单个消费者和生产者的流水线集体通信(例如，Partitioned allreduce)，分区集体将特别有用。这些操作还提供负载不平衡缓解。来自规则和不规则网格/网格应用程序的Halo交换码是该功能的关键候选应用程序类。考虑了轻量级通知过程MPI_Parrived和mpi_ready的泛化。本文还考虑了MPI-4.1中提出的用于点对点分区通信的MPIX_Pbuf_prepare过程的泛化，并在支持操作的就绪模式发送语义的背景下进行了说明。还提到了为初始化过程提供局部和不完整模式的选项(这也适用于持久的集体操作);这些语义与MPIX_Pbuf_prepare概念和进度规则交互。最后，概述了未来的工作，指出正式考虑MPI-5标准的先决条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Workshop on Exascale MPI (ExaMPI)

自引率

0.00%

发文量