利用ExaMPI的强大进步和现代c++设计实现通信与计算的重叠

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI:10.1109/ExaMPI54564.2021.00008

Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum

{"title":"利用ExaMPI的强大进步和现代c++设计实现通信与计算的重叠","authors":"Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum","doi":"10.1109/ExaMPI54564.2021.00008","DOIUrl":null,"url":null,"abstract":"ExaMPI is a modern, C++17+ Mpi implementation designed for modularity, extensibility, and understandability. In this work, we overview functionality new to ExaMPI since its initial release, including Libfabric-based network transport support. We also explain our rationale for why and how we choose to add new MPI features (and defer others). Lastly, we measured the latency of the aforementioned transports in ExaMPI and found that ExaMPI, while having slightly higher latency than other production MPI's, is competitive. It is no longer uncommon to see MPI applications using extra MPI calls during non-blocking MPI operations to coax MPI's progress engine. Strong, asynchronous progress (aka application bypass) in MPI is instead based on the premise that an application asks MPI to perform a non-blocking communication in the background and MPI completes said communication without requiring any additional MPI calls from the application to advance the underlying transport. Strong progress often requires an additional background thread, but with the current trend in exascale computing, cores appear to be in excess. Indeed, for earlier MPI implementations that supported it well, strong progress enabled overlap and reduced time to completion for some MPI applications. However, enabling or adding strong progress to existing MPI implementations is not straightforward; changing such implementations is cumbersome, difficult, invasive, and time-consuming-a key motivation for our research MPI implementation, ExaMPI. Specifically, we tested the ability for ExaMPI's strong progress engine to enable overlap communication and computation, finding that considerable overlap is achieved without needing additional MPI “helper” calls such as MPI Test.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"21 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design\",\"authors\":\"Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum\",\"doi\":\"10.1109/ExaMPI54564.2021.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ExaMPI is a modern, C++17+ Mpi implementation designed for modularity, extensibility, and understandability. In this work, we overview functionality new to ExaMPI since its initial release, including Libfabric-based network transport support. We also explain our rationale for why and how we choose to add new MPI features (and defer others). Lastly, we measured the latency of the aforementioned transports in ExaMPI and found that ExaMPI, while having slightly higher latency than other production MPI's, is competitive. It is no longer uncommon to see MPI applications using extra MPI calls during non-blocking MPI operations to coax MPI's progress engine. Strong, asynchronous progress (aka application bypass) in MPI is instead based on the premise that an application asks MPI to perform a non-blocking communication in the background and MPI completes said communication without requiring any additional MPI calls from the application to advance the underlying transport. Strong progress often requires an additional background thread, but with the current trend in exascale computing, cores appear to be in excess. Indeed, for earlier MPI implementations that supported it well, strong progress enabled overlap and reduced time to completion for some MPI applications. However, enabling or adding strong progress to existing MPI implementations is not straightforward; changing such implementations is cumbersome, difficult, invasive, and time-consuming-a key motivation for our research MPI implementation, ExaMPI. Specifically, we tested the ability for ExaMPI's strong progress engine to enable overlap communication and computation, finding that considerable overlap is achieved without needing additional MPI “helper” calls such as MPI Test.\",\"PeriodicalId\":222289,\"journal\":{\"name\":\"2021 Workshop on Exascale MPI (ExaMPI)\",\"volume\":\"21 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Workshop on Exascale MPI (ExaMPI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ExaMPI54564.2021.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Workshop on Exascale MPI (ExaMPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ExaMPI54564.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

ExaMPI是一个现代的c++ 17++ Mpi实现，旨在实现模块化、可扩展性和可理解性。在本文中，我们概述了ExaMPI自最初发布以来的新功能，包括基于libfabric的网络传输支持。我们还解释了为什么以及如何选择添加新的MPI功能(并推迟其他功能)的基本原理。最后，我们在ExaMPI中测量了上述传输的延迟，发现ExaMPI虽然比其他生产MPI的延迟稍高，但具有竞争力。在非阻塞MPI操作期间，MPI应用程序使用额外的MPI调用来诱导MPI的进度引擎，这种情况已经不再罕见。相反，MPI中的强异步进程(又名应用程序绕过)是基于一个前提，即应用程序要求MPI在后台执行非阻塞通信，MPI完成该通信，而不需要来自应用程序的任何额外MPI调用来推进底层传输。强大的进程通常需要一个额外的后台线程，但在当前百亿亿次计算的趋势下，内核似乎是多余的。事实上，对于早期支持它的MPI实现，强大的进度可以实现重叠，并减少一些MPI应用程序的完成时间。然而，在现有的MPI实现中启用或添加强大的进展并不是直截了当的;更改这样的实现是麻烦的、困难的、侵入性的和耗时的——这是我们研究MPI实现ExaMPI的一个关键动机。具体来说，我们测试了ExaMPI强大的进度引擎支持重叠通信和计算的能力，发现在不需要额外的MPI“助手”调用(如MPI Test)的情况下实现了相当大的重叠。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design

ExaMPI is a modern, C++17+ Mpi implementation designed for modularity, extensibility, and understandability. In this work, we overview functionality new to ExaMPI since its initial release, including Libfabric-based network transport support. We also explain our rationale for why and how we choose to add new MPI features (and defer others). Lastly, we measured the latency of the aforementioned transports in ExaMPI and found that ExaMPI, while having slightly higher latency than other production MPI's, is competitive. It is no longer uncommon to see MPI applications using extra MPI calls during non-blocking MPI operations to coax MPI's progress engine. Strong, asynchronous progress (aka application bypass) in MPI is instead based on the premise that an application asks MPI to perform a non-blocking communication in the background and MPI completes said communication without requiring any additional MPI calls from the application to advance the underlying transport. Strong progress often requires an additional background thread, but with the current trend in exascale computing, cores appear to be in excess. Indeed, for earlier MPI implementations that supported it well, strong progress enabled overlap and reduced time to completion for some MPI applications. However, enabling or adding strong progress to existing MPI implementations is not straightforward; changing such implementations is cumbersome, difficult, invasive, and time-consuming-a key motivation for our research MPI implementation, ExaMPI. Specifically, we tested the ability for ExaMPI's strong progress engine to enable overlap communication and computation, finding that considerable overlap is achieved without needing additional MPI “helper” calls such as MPI Test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Workshop on Exascale MPI (ExaMPI)

自引率

0.00%

发文量