Proceedings of the 22nd European MPI Users' Group Meeting最新文献

筛选
英文 中文
Correctness Analysis of MPI-3 Non-Blocking Communications in PARCOACH PARCOACH中MPI-3非阻塞通信的正确性分析
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802674
Julien Jaeger, Emmanuelle Saillard, Patrick Carribault, Denis Barthou
{"title":"Correctness Analysis of MPI-3 Non-Blocking Communications in PARCOACH","authors":"Julien Jaeger, Emmanuelle Saillard, Patrick Carribault, Denis Barthou","doi":"10.1145/2802658.2802674","DOIUrl":"https://doi.org/10.1145/2802658.2802674","url":null,"abstract":"MPI-3 provide functions for non-blocking collectives. To help programmers introduce non-blocking collectives to existing MPI programs, we improve the PARCOACH tool for checking correctness of MPI call sequences. These enhancements focus on correct call sequences of all flavor of collective calls, and on the presence of completion calls for all non-blocking communications. The evaluation shows an overhead under 10% of original compilation time.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115209570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
STCI: Scalable RunTime Component Infrastructure STCI:可伸缩的运行时组件基础设施
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802675
Geoffroy R. Vallée, D. Bernholdt, S. Böhm, T. Naughton
{"title":"STCI: Scalable RunTime Component Infrastructure","authors":"Geoffroy R. Vallée, D. Bernholdt, S. Böhm, T. Naughton","doi":"10.1145/2802658.2802675","DOIUrl":"https://doi.org/10.1145/2802658.2802675","url":null,"abstract":"Geoffroy Vallee Oak Ridge National Laboratory 1 Bethel Valley Road Oak Ridge, Tennessee, USA valleegr@ornl.gov David Bernholdt Oak Ridge National Laboratory 1 Bethel Valley Road Oak Ridge, Tennessee, USA bernholdtde@ornl.gov Swen Bohm Oak Ridge National Laboratory 1 Bethel Valley Road Oak Ridge, Tennessee, USA bohms@ornl.gov Thomas Naughton Oak Ridge National Laboratory 1 Bethel Valley Road Oak Ridge, Tennessee, USA naughtont@ornl.gov","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123458201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery 计划B:中断正在进行的MPI操作以支持故障恢复
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802668
Aurélien Bouteiller, G. Bosilca, J. Dongarra
{"title":"Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery","authors":"Aurélien Bouteiller, G. Bosilca, J. Dongarra","doi":"10.1145/2802658.2802668","DOIUrl":"https://doi.org/10.1145/2802658.2802668","url":null,"abstract":"Advanced failure recovery strategies in HPC system benefit tremendously from in-place failure recovery, in which the MPI infrastructure can survive process crashes and resume communication services. In this paper we present the rationale behind the specification, and an effective implementation of the Revoke MPI operation. The purpose of the Revoke operation is the propagation of failure knowledge, and the interruption of ongoing, pending communication, under the control of the user. We explain that the Revoke operation can be implemented with a reliable broadcast over the scalable and failure resilient Binomial Graph (BMG) overlay network. Evaluation at scale, on a Cray XC30 supercomputer, demonstrates that the Revoke operation has a small latency, and does not introduce system noise outside of failure recovery periods.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134115159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Specification Guideline Violations by MPI_Dims_create MPI_Dims_create违反规范准则
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802677
J. Träff, F. Lübbe
{"title":"Specification Guideline Violations by MPI_Dims_create","authors":"J. Träff, F. Lübbe","doi":"10.1145/2802658.2802677","DOIUrl":"https://doi.org/10.1145/2802658.2802677","url":null,"abstract":"In benchmarking a library providing alternative functionality for structured, so-called isomorphic, sparse collective communication [4], we found use for the MPI_Dims_create functionality of MPI [3] for suggesting a balanced factorization of a given number p (of MPI processes) into d factors that can be used as the dimension sizes in a d-dimensional Cartesian communicator. Much to our surprise, we observed that a) different MPI libraries can differ quite significantly in the factorization they suggest, and b) the produced factorizations can sometimes be quite far from balanced, indeed, for some composite numbers p some MPI libraries sometimes return trivial factorizations (p as factor). This renders the functionality, as implemented, useless. In this poster abstract, we elaborate on these findings.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131569549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations 并行模板计算的同构、稀疏类mpi集体通信操作
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802663
J. Träff, F. Lübbe, Antoine Rougier, S. Hunold
{"title":"Isomorphic, Sparse MPI-like Collective Communication Operations for Parallel Stencil Computations","authors":"J. Träff, F. Lübbe, Antoine Rougier, S. Hunold","doi":"10.1145/2802658.2802663","DOIUrl":"https://doi.org/10.1145/2802658.2802663","url":null,"abstract":"We propose a specification and discuss implementations of collective operations for parallel stencil-like computations that are not supported well by the current MPI 3.1 neighborhood collectives. In our isomorphic, sparse collectives all processes partaking in the communication operation use similar neighborhoods of processes with which to exchange data. Our interface assumes the p processes to be arranged in a d-dimensional torus (mesh) over which neighborhoods are specified per process by identical lists of relative coordinates. This extends significantly on the functionality for Cartesian communicators, and is a much lighter mechanism than distributed graph topologies. It allows for fast, local computation of communication schedules, and can be used in more dynamic contexts than current MPI functionality. We sketch three algorithms for neighborhoods with s source and target neighbors, namely a) a direct algorithm taking s communication rounds, b) a message-combining algorithm that communicates only along torus coordinates, and c) a message-combining algorithm using between [log s] and [log p] communication rounds. Our concrete interface has been implemented using the direct algorithm a). We benchmark our implementations and compare to the MPI neighborhood collectives. We demonstrate significant advantages in set-up times, and comparable communication times. Finally, we use our isomorphic, sparse collectives to implement a stencil computation with a deep halo, and discuss derived datatypes required for this application.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types 高效,最优的MPI数据类型重建向量和索引类型
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802671
Martin Kalany, J. Träff
{"title":"Efficient, Optimal MPI Datatype Reconstruction for Vector and Index Types","authors":"Martin Kalany, J. Träff","doi":"10.1145/2802658.2802671","DOIUrl":"https://doi.org/10.1145/2802658.2802671","url":null,"abstract":"Type reconstruction is the process of finding an efficient representation in terms of space and processing time of a data layout as an MPI derived datatype. Practically efficient type reconstruction and normalization is important for high-quality MPI implementations that strive to provide good performance for communication operations involving noncontiguous data. Although it has recently been shown that the general problem of computing optimal tree representations of derived datatypes allowing any of the MPI derived datatype constructors can be solved in polynomial time, the algorithm for this may unfortunately be impractical for datatypes with large counts. By restricting the allowed constructors to vector and index-block type constructors, but excluding the most general MPI_Type_create_struct constructor, the problem can be solved much more efficiently. More precisely, we give a new O(n log n/log log n) time algorithm for finding cost-optimal representations of MPI type maps of length n using only vector and index-block constructors for a simple but flexible, additive cost model. This improves significantly over a previous O(n√n) time algorithm for the same problem, and the algorithm is simple enough to be considered for practical MPI libraries.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125604744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning MPI顾问:MPI库性能调优的最小开销工具
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802667
E. Gallardo, Jérôme Vienne, L. Fialho, P. Teller, J. Browne
{"title":"MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning","authors":"E. Gallardo, Jérôme Vienne, L. Fialho, P. Teller, J. Browne","doi":"10.1145/2802658.2802667","DOIUrl":"https://doi.org/10.1145/2802658.2802667","url":null,"abstract":"A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126794703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
DAME: A Runtime-Compiled Engine for Derived Datatypes DAME:用于派生数据类型的运行时编译引擎
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802659
Tarun Prabhu, W. Gropp
{"title":"DAME: A Runtime-Compiled Engine for Derived Datatypes","authors":"Tarun Prabhu, W. Gropp","doi":"10.1145/2802658.2802659","DOIUrl":"https://doi.org/10.1145/2802658.2802659","url":null,"abstract":"In order to achieve high performance on modern and future machines, applications need to make effective use of the complex, hierarchical memory system. Writing performance-portable code continues to be challenging since each architecture has unique memory access characteristics. In addition, some optimization decisions can only reasonably be made at runtime. This suggests that a two-pronged approach to address the challenge is required. First, provide the programmer with a means to express memory operations declaratively which will allow a runtime system to transparently access the memory in the best way and second, exploit runtime information. MPI's derived datatypes accomplish the former although their performance in current MPI implementations shows scope for improvement. JIT-compilation can be used for the latter. In this work, we present DAME --- a language and interpreter that is used as the backend for MPI's derived datatypes. We also present DAME-L and DAME-X, two JIT-enabled implementations of DAME. All three implementations have been integrated into MPICH. We evaluate the performance of our implementations using DDTBench and two mini-applications written with MPI derived datatypes and obtain communication speedups of up to 20x and mini-application speedup of 3x.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132139821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Performance Evaluation of OpenFOAM* with MPI-3 RMA Routines on Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessors OpenFOAM*与MPI-3 RMA例程在Intel®Xeon®处理器和Intel®Xeon Phi™协处理器上的性能评估
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802676
Nishant Agrawal, Paul Edwards, Ambuj Pandey, Michael Klemm, Ravi Ojha, R. A. Razak
{"title":"Performance Evaluation of OpenFOAM* with MPI-3 RMA Routines on Intel® Xeon® Processors and Intel® Xeon Phi™ Coprocessors","authors":"Nishant Agrawal, Paul Edwards, Ambuj Pandey, Michael Klemm, Ravi Ojha, R. A. Razak","doi":"10.1145/2802658.2802676","DOIUrl":"https://doi.org/10.1145/2802658.2802676","url":null,"abstract":"OpenFOAM is a software package for solving partial differential equations and is very popular for computational fluid dynamics in the automotive segment. In this work, we describe our evaluation of the performance of OpenFOAM with MPI-3 Remote Memory Access (RMA) one-sided communication on the Intel® Xeon Phi\" coprocessor. Currently, OpenFOAM computes on a mesh that is decomposed among different MPI ranks, and it requires a high amount of communication between the neighboring ranks. MPI-3 offers RMA through a new API that decouples communication and synchronization. The aim is to achieve better performance with MPI-3 RMA routines as compared to the current two-sided asynchronous communication routines in OpenFOAM. We also showcase the challenges overcome in order to facilitate the different MPI-3 RMA routines in OpenFOAM. This discussion aims at analyzing the potential of MPI-3 RMA in OpenFOAM and benchmarking the performance on both the processor and the coprocessor. Our work also demonstrates that MPI-3 RMA in OpenFOAM can run in symmetric mode consisting of the Intel® Xeon® E5-2697v3 processor and the Intel® Xeon Phi™ 7120P coprocessor.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133461024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives 论时钟和进程同步对MPI集体基准测试的影响
Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802662
S. Hunold, Alexandra Carpen-Amarie
{"title":"On the Impact of Synchronizing Clocks and Processes on Benchmarking MPI Collectives","authors":"S. Hunold, Alexandra Carpen-Amarie","doi":"10.1145/2802658.2802662","DOIUrl":"https://doi.org/10.1145/2802658.2802662","url":null,"abstract":"We consider the problem of accurately measuring the time to complete an MPI collective operation, as the result strongly depends on how the time is measured. Our goal is to develop an experimental method that allows for reproducible measurements of MPI collectives. When executing large parallel codes, MPI processes are often skewed in time when entering a collective operation. However, to obtain reproducible measurements, it is a common approach to synchronize all processes before they call the MPI collective operation. We therefore take a closer look at two commonly used process synchronization schemes: (1) relying on MPI_Barrier or (2) applying a window-based scheme using a common global time. We analyze both schemes experimentally and show the strengths and weaknesses of each approach. As window-based schemes require the notion of global time, we thoroughly evaluate different clock synchronization algorithms in various experiments. We also propose a novel clock synchronization algorithm that combines two advantages of known algorithms, which are (1) taking the inherent clock drift into account and (2) using a tree-based synchronization scheme to reduce the synchronization duration.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131216660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信