2021 Workshop on Exascale MPI (ExaMPI)最新文献

Towards Modern C++ Language Support for MPI 面向MPI的现代c++语言支持

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00009

Sayan Ghosh, Clara Alsobrooks, Martin Rüfenacht, A. Skjellum, P. Bangalore, A. Lumsdaine

{"title":"Towards Modern C++ Language Support for MPI","authors":"Sayan Ghosh, Clara Alsobrooks, Martin Rüfenacht, A. Skjellum, P. Bangalore, A. Lumsdaine","doi":"10.1109/ExaMPI54564.2021.00009","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00009","url":null,"abstract":"The C++ programming language has made significant strides in improving performance and productivity across a broad spectrum of applications and hardware. The C++ language bindings to MPI had been deleted since MPI 3.0 (circa 2009) because it reportedly added only minimal functionality over the existing C bindings relative to modern C++ practice while incurring significant amount of maintenance to the MPI standard specification. Two years after the MPI C++ interface was eliminated, the ISO C++ 11standard was published, which paved the way for modern C++ through numerous improvements to the core language. Since then, there has been continuous enthusiasm among application developers and the MPI Forum for modern C++ bindings to MPI. In this paper, we discuss ongoing efforts of the recently formed MPI working group on language bindings in the context of providing modern C++(C++11 and beyond) support to MPI. Because of the lack of standardized bindings, C++-based MPI applications will often layer their own custom subsets of C++ Mpi functionality on top of lower-level C; application- and/or domain-specific abstractions are subsequently layered on this custom subset. From such efforts, it is apparently a challenge to devise a compact set of C++ bindings over MPI with the “right” level of abstractions to support a variety of application uses cases under the expected performance/memory constraints. However, we are convinced that it is possible to identify and eventually standardize a normative set of C++ bindings to MPI that can provide the basic functionality required by distributed-memory applications. To engage with the broader MPI and C++ communities, we discuss a prototypical interface derived from mpl, an open-source C++ 17message passing library based on MPI.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128995023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Leveraging Interconnect QoS Capabilities for Congestion-Aware MPI Communication 利用互连QoS功能实现感知拥塞的MPI通信

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00006

M. Khalilov, Aliaksei Slinka, Q. Zhang

引用次数: 1

Accelerating Multi - Process Communication for Parallel 3-D FFT 加速并行三维FFT的多进程通信

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00011

Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra

引用次数: 2

Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design 利用ExaMPI的强大进步和现代c++设计实现通信与计算的重叠

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00008

Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum

{"title":"Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design","authors":"Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum","doi":"10.1109/ExaMPI54564.2021.00008","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00008","url":null,"abstract":"ExaMPI is a modern, C++17+ Mpi implementation designed for modularity, extensibility, and understandability. In this work, we overview functionality new to ExaMPI since its initial release, including Libfabric-based network transport support. We also explain our rationale for why and how we choose to add new MPI features (and defer others). Lastly, we measured the latency of the aforementioned transports in ExaMPI and found that ExaMPI, while having slightly higher latency than other production MPI's, is competitive. It is no longer uncommon to see MPI applications using extra MPI calls during non-blocking MPI operations to coax MPI's progress engine. Strong, asynchronous progress (aka application bypass) in MPI is instead based on the premise that an application asks MPI to perform a non-blocking communication in the background and MPI completes said communication without requiring any additional MPI calls from the application to advance the underlying transport. Strong progress often requires an additional background thread, but with the current trend in exascale computing, cores appear to be in excess. Indeed, for earlier MPI implementations that supported it well, strong progress enabled overlap and reduced time to completion for some MPI applications. However, enabling or adding strong progress to existing MPI implementations is not straightforward; changing such implementations is cumbersome, difficult, invasive, and time-consuming-a key motivation for our research MPI implementation, ExaMPI. Specifically, we tested the ability for ExaMPI's strong progress engine to enable overlap communication and computation, finding that considerable overlap is achieved without needing additional MPI “helper” calls such as MPI Test.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"21 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Partitioned Collective Communication 分割的集体沟通

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00007

Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer

{"title":"Partitioned Collective Communication","authors":"Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer","doi":"10.1109/ExaMPI54564.2021.00007","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00007","url":null,"abstract":"Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129232024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

[Copyright notice] (版权)

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00002

引用次数: 0

Proceedings of ExaMPI 2021: Workshop on Exascale MPI [Title page] ExaMPI会议记录2021:百亿亿次MPI研讨会[标题页]

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00001

引用次数: 0

Message from the Workshop Chairs 来自研讨会主席的信息

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00004

W. Scullin, N. Banglawala, Rosa M. Badia, James Clark

引用次数: 0

A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems 基于事实的方法:使机器学习集体自动调谐在百亿亿级系统上可行

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00010

Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si

{"title":"A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems","authors":"Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si","doi":"10.1109/ExaMPI54564.2021.00010","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00010","url":null,"abstract":"According to recent performance analyses, MPI collective operations make up a quarter of the execution time on production systems. Machine learning (ML) autotuners use supervised learning to select collective algorithms, significantly improving collective performance. However, we observe two barriers preventing their adoption over the default heuristic-based autotuners. First, a user may find it difficult to compare autotuners because we lack a methodology to quantify their performance. We call this the performance quantification challenge. Second, to obtain the advertised performance, ML model training requires benchmark data from a vast majority of the feature space. Collecting such data regularly on large scale systems consumes far too much time and resources, and this will only get worse with exascale systems. We refer to this as the training data collection challenge. To address these challenges, we contribute (1) a performance evaluation framework to compare and improve collective au-totuner designs and (2) the Feature scaling, Active learning, Converge, Tune hyperparameters (FACT) approach, a three-part methodology to minimize the training data collection time (and thus maximize practicality at larger scale) without sacrificing accuracy. In the methodology, we first preprocess feature and output values based on domain knowledge. Then, we use active learning to iteratively collect only necessary training data points. Lastly, we perform hyperparameter tuning to further improve model accuracy without any additional data. On a production scale system, our methodology produces a model of equal accuracy using 6.88x less training data collection time.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3