2021 Workshop on Exascale MPI (ExaMPI)最新文献

筛选
英文 中文
Towards Modern C++ Language Support for MPI 面向MPI的现代c++语言支持
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00009
Sayan Ghosh, Clara Alsobrooks, Martin Rüfenacht, A. Skjellum, P. Bangalore, A. Lumsdaine
{"title":"Towards Modern C++ Language Support for MPI","authors":"Sayan Ghosh, Clara Alsobrooks, Martin Rüfenacht, A. Skjellum, P. Bangalore, A. Lumsdaine","doi":"10.1109/ExaMPI54564.2021.00009","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00009","url":null,"abstract":"The C++ programming language has made significant strides in improving performance and productivity across a broad spectrum of applications and hardware. The C++ language bindings to MPI had been deleted since MPI 3.0 (circa 2009) because it reportedly added only minimal functionality over the existing C bindings relative to modern C++ practice while incurring significant amount of maintenance to the MPI standard specification. Two years after the MPI C++ interface was eliminated, the ISO C++ 11standard was published, which paved the way for modern C++ through numerous improvements to the core language. Since then, there has been continuous enthusiasm among application developers and the MPI Forum for modern C++ bindings to MPI. In this paper, we discuss ongoing efforts of the recently formed MPI working group on language bindings in the context of providing modern C++(C++11 and beyond) support to MPI. Because of the lack of standardized bindings, C++-based MPI applications will often layer their own custom subsets of C++ Mpi functionality on top of lower-level C; application- and/or domain-specific abstractions are subsequently layered on this custom subset. From such efforts, it is apparently a challenge to devise a compact set of C++ bindings over MPI with the “right” level of abstractions to support a variety of application uses cases under the expected performance/memory constraints. However, we are convinced that it is possible to identify and eventually standardize a normative set of C++ bindings to MPI that can provide the basic functionality required by distributed-memory applications. To engage with the broader MPI and C++ communities, we discuss a prototypical interface derived from mpl, an open-source C++ 17message passing library based on MPI.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128995023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Leveraging Interconnect QoS Capabilities for Congestion-Aware MPI Communication 利用互连QoS功能实现感知拥塞的MPI通信
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00006
M. Khalilov, Aliaksei Slinka, Q. Zhang
{"title":"Leveraging Interconnect QoS Capabilities for Congestion-Aware MPI Communication","authors":"M. Khalilov, Aliaksei Slinka, Q. Zhang","doi":"10.1109/ExaMPI54564.2021.00006","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00006","url":null,"abstract":"Resource sharing between job allocations on production supercomputers often leads to traffic interference between MPI applications. Network congestion, a side effect of such resource sharing, results in substantial MPI latency degradation, making application performance unpredictable. We consider the traffic isolation capabilities of modern RDMA interconnects and propose a generic Priority Assignment algorithm that associates a priority with each network Send operation using information about the network latency in the past. Our implementation of Priority Assignment algorithm in UCX framework shows up to the 22 x GPCNeT benchmark improvement on 64 node cluster with InfiniBand EDR and OpenMPI in comparison to the reference UCX implementation. Packet-level simulation shows that proposed Priority Assignment algorithm helps to mitigate the effects of network congestion scaling up to the 512 nodes.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"146 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113987272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating Multi - Process Communication for Parallel 3-D FFT 加速并行三维FFT的多进程通信
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00011
Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra
{"title":"Accelerating Multi - Process Communication for Parallel 3-D FFT","authors":"Alan Ayala, S. Tomov, M. Stoyanov, A. Haidar, J. Dongarra","doi":"10.1109/ExaMPI54564.2021.00011","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00011","url":null,"abstract":"Today largest and most powerful supercomputers in the world are built on heterogeneous platforms; and using the combined power of multi-core CPUs and GPUs, has had a great impact accelerating large-scale applications. However, on these architectures, parallel algorithms, such as the Fast Fourier Transform (FFT), encounter that inter-processor communication become a bottleneck and limits their scalability. In this paper, we present techniques for speeding up multi-process communication cost during the computation of FFTs, considering hybrid network connections as those expected on upcoming exascale machines. Among our techniques, we present algorithmic tuning, making use of phase diagrams; parametric tuning, using different FFT settings; and MPI distribution tuning based on FFT size and computational resources available. We present several experiments obtained on Summit supercomputer at Oak Ridge National Laboratory, using up to 40,960 IBM Power9 cores and 6,144 NVIDIA V-100 GPUs.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120947812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Partitioned Collective Communication 分割的集体沟通
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00007
Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer
{"title":"Partitioned Collective Communication","authors":"Daniel J. Holmes, A. Skjellum, Julien Jaeger, Ryan E. Grant, P. Bangalore, Matthew G. F. Dosanjh, Amanda Bienz, Derek Schafer","doi":"10.1109/ExaMPI54564.2021.00007","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00007","url":null,"abstract":"Partitioned point-to-point communication and persistent collective communication were both recently standardized in MPI-4.0. Each offers performance and scalability advantages over MPI-3.1-based communication when planned transfers are feasible in an MPI application. Their merger into a generalized, persistent collective communication with partitions is a logical next step, with significant advantages for performance portability. Non-trivial decisions about the syntax and semantics of such operations need to be addressed, including scope of knowledge of partitioning choices by members of the communicator's group(s). This paper introduces and motivates proposed interfaces for partitioned collective communication. Partitioned collectives will be particularly useful for multithreaded, accelerator-offloaded, and/or hardware-collective-enhanced MPI implementations driving suitable applications, as well as for pipelined collective communication (e.g., partitioned allreduce) with single consumers and producers per MPI process. These operations also provide load imbalance mitigation. Halo exchange codes arising from regular and irregular grid/mesh applications are a key candidate class of applications for this functionality. Generalizations of lightweight notification procedures MPI_Parrived and MPI_Pready are considered. Generalization of MPIX_Pbuf_prepare, a procedure proposed for MPI-4.1 for point-to-point partitioned communication, are also considered, shown in context of supporting ready-mode send semantics for the operations. The option of providing local and incomplete modes for initialization procedures is mentioned (which could also apply to persistent collective operations); these semantics interact with the MPIX_Pbuf_prepare concept and the progress rule. Last, future work is outlined, indicating prerequisites for formal consideration for the MPI-5 standard.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129232024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design 利用ExaMPI的强大进步和现代c++设计实现通信与计算的重叠
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00008
Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum
{"title":"Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C++ Design","authors":"Derek Schafer, Thomas M. Hines, E. Suggs, Martin Rüfenacht, A. Skjellum","doi":"10.1109/ExaMPI54564.2021.00008","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00008","url":null,"abstract":"ExaMPI is a modern, C++17+ Mpi implementation designed for modularity, extensibility, and understandability. In this work, we overview functionality new to ExaMPI since its initial release, including Libfabric-based network transport support. We also explain our rationale for why and how we choose to add new MPI features (and defer others). Lastly, we measured the latency of the aforementioned transports in ExaMPI and found that ExaMPI, while having slightly higher latency than other production MPI's, is competitive. It is no longer uncommon to see MPI applications using extra MPI calls during non-blocking MPI operations to coax MPI's progress engine. Strong, asynchronous progress (aka application bypass) in MPI is instead based on the premise that an application asks MPI to perform a non-blocking communication in the background and MPI completes said communication without requiring any additional MPI calls from the application to advance the underlying transport. Strong progress often requires an additional background thread, but with the current trend in exascale computing, cores appear to be in excess. Indeed, for earlier MPI implementations that supported it well, strong progress enabled overlap and reduced time to completion for some MPI applications. However, enabling or adding strong progress to existing MPI implementations is not straightforward; changing such implementations is cumbersome, difficult, invasive, and time-consuming-a key motivation for our research MPI implementation, ExaMPI. Specifically, we tested the ability for ExaMPI's strong progress engine to enable overlap communication and computation, finding that considerable overlap is achieved without needing additional MPI “helper” calls such as MPI Test.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"21 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
[Copyright notice] (版权)
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00002
{"title":"[Copyright notice]","authors":"","doi":"10.1109/exampi54564.2021.00002","DOIUrl":"https://doi.org/10.1109/exampi54564.2021.00002","url":null,"abstract":"","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126107401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proceedings of ExaMPI 2021: Workshop on Exascale MPI [Title page] ExaMPI会议记录2021:百亿亿次MPI研讨会[标题页]
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00001
{"title":"Proceedings of ExaMPI 2021: Workshop on Exascale MPI [Title page]","authors":"","doi":"10.1109/exampi54564.2021.00001","DOIUrl":"https://doi.org/10.1109/exampi54564.2021.00001","url":null,"abstract":"","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the Workshop Chairs 来自研讨会主席的信息
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/exampi54564.2021.00004
W. Scullin, N. Banglawala, Rosa M. Badia, James Clark
{"title":"Message from the Workshop Chairs","authors":"W. Scullin, N. Banglawala, Rosa M. Badia, James Clark","doi":"10.1109/exampi54564.2021.00004","DOIUrl":"https://doi.org/10.1109/exampi54564.2021.00004","url":null,"abstract":"This workshop’s program includes a keynote, two invited talks, six papers and four lightning talks. Paper topics include enabling shared memory access to Python processes in task-based programming models; workarounds for Python workflows in HPC environments; experiences in developing a distributed Agent-Based modelling (ABM) distributed toolkit in Python; new contributions to a distributed, asynchronous many-task (AMT) computing framework that encompasses the entire computing process, from a Jupyter front-end for managing code and results to the collection and visualization of performance data; a new high-performance Python API with a C++ core to represent data as a table and provide distributed data operations; and a computation environment for HPC that aims to accelerate microstructural analytics scaling Numpy workflows to enable multidimensional image analysis of diverse specimens.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126980316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems 基于事实的方法:使机器学习集体自动调谐在百亿亿级系统上可行
2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI: 10.1109/ExaMPI54564.2021.00010
Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si
{"title":"A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems","authors":"Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si","doi":"10.1109/ExaMPI54564.2021.00010","DOIUrl":"https://doi.org/10.1109/ExaMPI54564.2021.00010","url":null,"abstract":"According to recent performance analyses, MPI collective operations make up a quarter of the execution time on production systems. Machine learning (ML) autotuners use supervised learning to select collective algorithms, significantly improving collective performance. However, we observe two barriers preventing their adoption over the default heuristic-based autotuners. First, a user may find it difficult to compare autotuners because we lack a methodology to quantify their performance. We call this the performance quantification challenge. Second, to obtain the advertised performance, ML model training requires benchmark data from a vast majority of the feature space. Collecting such data regularly on large scale systems consumes far too much time and resources, and this will only get worse with exascale systems. We refer to this as the training data collection challenge. To address these challenges, we contribute (1) a performance evaluation framework to compare and improve collective au-totuner designs and (2) the Feature scaling, Active learning, Converge, Tune hyperparameters (FACT) approach, a three-part methodology to minimize the training data collection time (and thus maximize practicality at larger scale) without sacrificing accuracy. In the methodology, we first preprocess feature and output values based on domain knowledge. Then, we use active learning to iteratively collect only necessary training data points. Lastly, we perform hyperparameter tuning to further improve model accuracy without any additional data. On a production scale system, our methodology produces a model of equal accuracy using 6.88x less training data collection time.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129655615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信