Partitioned Collective Communication

Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer
{"title":"Partitioned Collective Communication","authors":"Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer","doi":"10.1109/ExaMPI54564.2021.00007","DOIUrl":null,"url":null,"abstract":"Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Workshop on Exascale MPI (ExaMPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ExaMPI54564.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.
分割的集体沟通
分区的点对点通信和持久的集体通信最近都在MPI-4.0中标准化。当计划传输在MPI应用程序中可行时,与基于MPI-3.1的通信相比,每种通信都具有性能和可伸缩性优势。将它们合并为与分区的通用的、持久的集体通信是合乎逻辑的下一步,这在性能可移植性方面具有显著的优势。需要处理关于这些操作的语法和语义的重要决策,包括通信器组成员对分区选择的知识范围。本文介绍并提出了用于分区集体通信的接口方案。对于多线程、加速器卸载和/或硬件集体增强的MPI实现驱动合适的应用程序,以及每个MPI进程具有单个消费者和生产者的流水线集体通信(例如,Partitioned allreduce),分区集体将特别有用。这些操作还提供负载不平衡缓解。来自规则和不规则网格/网格应用程序的Halo交换码是该功能的关键候选应用程序类。考虑了轻量级通知过程MPI_Parrived和mpi_ready的泛化。本文还考虑了MPI-4.1中提出的用于点对点分区通信的MPIX_Pbuf_prepare过程的泛化,并在支持操作的就绪模式发送语义的背景下进行了说明。还提到了为初始化过程提供局部和不完整模式的选项(这也适用于持久的集体操作);这些语义与MPIX_Pbuf_prepare概念和进度规则交互。最后,概述了未来的工作,指出正式考虑MPI-5标准的先决条件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信