You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI:10.1109/IPDPSW.2019.00045

Benjamin W. Priest, Trevor Steil, G. Sanders, R. Pearce

{"title":"You've Got Mail (YGM): Building Missing Asynchronous Communication Primitives","authors":"Benjamin W. Priest, Trevor Steil, G. Sanders, R. Pearce","doi":"10.1109/IPDPSW.2019.00045","DOIUrl":null,"url":null,"abstract":"The Message Passing Interface (MPI) is the de facto standard for message handling in distributed computing. MPI collective communication schemes where many processors communicate with one another depend upon synchronous handshake agreements. This results in applications depending upon iterative collective communications moving at the speed of their slowest processors. We describe a methodology for bootstrapping asynchronous communication primitives to MPI, with an emphasis on irregular and imbalanced all-to-all communication patterns found in many data analytics applications. In such applications, the communication payload between a pair of processors is often small, requiring message aggregation on modern networks. In this work, we develop novel routing schemes that divide routing logically into local and remote routing. In these schemes, each core on a node is responsible for handing all local node sends and/or receives with a subset of remote cores. Collective communications route messages along their designated intermediaries, and are not influenced by the availability of cores not on their route. Unlike conventional synchronous collectives, cores participating in these schemes can enter the protocol when ready and exit once all of their sends and receives are processed. We demonstrate, using simple benchmarks, how this collective communication improves overall wall clock performance, as well as bandwidth and core utilization, for applications with a high demand for arbitrary core-core communication and unequal computational load between cores.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

The Message Passing Interface (MPI) is the de facto standard for message handling in distributed computing. MPI collective communication schemes where many processors communicate with one another depend upon synchronous handshake agreements. This results in applications depending upon iterative collective communications moving at the speed of their slowest processors. We describe a methodology for bootstrapping asynchronous communication primitives to MPI, with an emphasis on irregular and imbalanced all-to-all communication patterns found in many data analytics applications. In such applications, the communication payload between a pair of processors is often small, requiring message aggregation on modern networks. In this work, we develop novel routing schemes that divide routing logically into local and remote routing. In these schemes, each core on a node is responsible for handing all local node sends and/or receives with a subset of remote cores. Collective communications route messages along their designated intermediaries, and are not influenced by the availability of cores not on their route. Unlike conventional synchronous collectives, cores participating in these schemes can enter the protocol when ready and exit once all of their sends and receives are processed. We demonstrate, using simple benchmarks, how this collective communication improves overall wall clock performance, as well as bandwidth and core utilization, for applications with a high demand for arbitrary core-core communication and unequal computational load between cores.

查看原文本刊更多论文

You've Got Mail (YGM):构建丢失的异步通信原语

消息传递接口(MPI)是分布式计算中消息处理的事实标准。许多处理器相互通信的MPI集体通信方案依赖于同步握手协议。这导致依赖于迭代集体通信的应用程序以其最慢的处理器的速度移动。我们描述了一种将异步通信原语引导到MPI的方法，重点是在许多数据分析应用程序中发现的不规则和不平衡的全对全通信模式。在这样的应用程序中，一对处理器之间的通信负载通常很小，需要在现代网络上进行消息聚合。在这项工作中，我们开发了新的路由方案，将路由逻辑地划分为本地路由和远程路由。在这些方案中，节点上的每个核心负责处理所有本地节点发送和/或接收的远程核心子集。集体通信沿着其指定的中介体路由消息，并且不受不在其路由上的核心的可用性的影响。与传统的同步集体不同，参与这些方案的核心可以在准备好时进入协议，并在所有发送和接收都被处理后退出。我们使用简单的基准测试，演示了这种集体通信如何提高整体时钟性能，以及带宽和核心利用率，适用于对任意核心-核心通信有高需求的应用程序和核心之间不相等的计算负载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量