Proceedings of the 24th European MPI Users' Group Meeting最新文献

筛选
英文 中文
A hierarchical model to manage hardware topology in MPI applications 在MPI应用程序中管理硬件拓扑的分层模型
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127030
E. Jeannot, Farouk Mansouri, Guillaume Mercier
{"title":"A hierarchical model to manage hardware topology in MPI applications","authors":"E. Jeannot, Farouk Mansouri, Guillaume Mercier","doi":"10.1145/3127024.3127030","DOIUrl":"https://doi.org/10.1145/3127024.3127030","url":null,"abstract":"The MPI standard is a major contribution in the landscape of parallel programming. Since its inception in the mid 90's it has ensured portability and performance for parallel applications on a wide spectrum of machines and architectures. With the advent of multicore machines, understanding and taking into account the underlying physical topology and memory hierarchy have become of paramount importance. The MPI standard in its current state, however, and despite recent evolutions is still unable to offer mechanisms to achieve this. In this paper, we detail several additions to the standard that give the user tools to address the hardware topology and data locality issues while improving application performance.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125907972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Offloaded MPI persistent collectives using persistent generalized request interface 使用持久通用请求接口卸载MPI持久集合
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127029
M. Hatanaka, Masamichi Takagi, A. Hori, Y. Ishikawa
{"title":"Offloaded MPI persistent collectives using persistent generalized request interface","authors":"M. Hatanaka, Masamichi Takagi, A. Hori, Y. Ishikawa","doi":"10.1145/3127024.3127029","DOIUrl":"https://doi.org/10.1145/3127024.3127029","url":null,"abstract":"This paper proposes a library with a persistent generalized request interface for the implementation of persistent communication operations. This interface allows developers to add persistent communication functions to the existing MPI library. We implemented a new generalized request interface which supports persistent communications because the generalized requests of the MPI standard lacks the features needed for persistent communications. We evaluate the expressiveness of the interface by developing five implementations of a persistent collective operation, namely, MPI_Neighbor_-alltoall_init: one utilizes the collective offload capability of Fujitsu FX100 Tofu2 interconnect and other four utilize the standard MPI functions and the Fujitsu-extended MPI functions. These implementations are evaluated on FX100 with micro-benchmark programs measuring latency. The results show that the offloaded version outperforms the existing implementations by more than a factor of two with data sizes up to 16 KiB, confirming that the proposed library interface facilitates the development of persistent collectives and the offloaded implementation exhibits the expected performance.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115197176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Notified access in coarray fortran coarray fortran中的通知访问
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127026
A. Fanfarillo, D. D. Vento
{"title":"Notified access in coarray fortran","authors":"A. Fanfarillo, D. D. Vento","doi":"10.1145/3127024.3127026","DOIUrl":"https://doi.org/10.1145/3127024.3127026","url":null,"abstract":"With the increasing availability of the Remote Direct Memory Access (RDMA) support in computer networks, the so called Partitioned Global Address Space (PGAS) model has evolved in the last few years. Although there are several cases where a PGAS approach can easily solve difficult message passing situations, like in particle tracking and adaptive mesh refinement applications, the producer-consumer pattern, usually adopted in task-based parallelism, can only be implemented inefficiently because of the separation between data transfer and synchronization (which is usually unified in message passing programming models). In this paper, we provide two contributions: 1) we propose an extension for the Fortran language that provides the concept of Notified Access by associating regular coarray variables with event variables. 2) We demonstrate that the MPI extension proposed by foMPI for Notified Access can be used effectively to implement the same concept in a PGAS run-time library like OpenCoarrays.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115393539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enabling hierarchy-aware MPI collectives in dynamically changing topologies 在动态变化的拓扑中启用层次结构感知的MPI集合
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127031
Simon Pickartz, Carsten Clauss, Stefan Lankes, A. Monti
{"title":"Enabling hierarchy-aware MPI collectives in dynamically changing topologies","authors":"Simon Pickartz, Carsten Clauss, Stefan Lankes, A. Monti","doi":"10.1145/3127024.3127031","DOIUrl":"https://doi.org/10.1145/3127024.3127031","url":null,"abstract":"Hierarchy-awareness for message-passing has been around since the early 2000s with the emergence of SMP systems. Since then, many works dealt with the optimization of collective communication operations (so-called collectives) for such hierarchical topologies. However, until now, all these optimizations basically assume that the hierarchical topology remains static in a parallel program. In contrast, this paper strives for a discussion of how dynamically changing topologies can be considered during runtime, especially with focus on collective communication patterns. In doing so, the discussion starter for this is the possibility of process migration, e. g., in virtualized environments where the MPI processes are encapsulated within virtual machines. Consequently, processes originally located on distinct nodes can then (dynamically) become neighbors on the same SMP node. The central subject for the discussion on how such changes can be taken into account for optimized collectives is a new experimental MPI function that we propose and detail within this paper.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131722397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proceedings of the 24th European MPI Users' Group Meeting 第24届欧洲MPI用户组会议论文集
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024
Antonio J. Peña, P. Balaji, W. Gropp, R. Thakur
{"title":"Proceedings of the 24th European MPI Users' Group Meeting","authors":"Antonio J. Peña, P. Balaji, W. Gropp, R. Thakur","doi":"10.1145/3127024","DOIUrl":"https://doi.org/10.1145/3127024","url":null,"abstract":"","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133021699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PMIx: process management for exascale environments PMIx:百亿亿级环境的进程管理
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127027
R. Castain, David G. Solt, Joshua Hursey, Aurélien Bouteiller
{"title":"PMIx: process management for exascale environments","authors":"R. Castain, David G. Solt, Joshua Hursey, Aurélien Bouteiller","doi":"10.1145/3127024.3127027","DOIUrl":"https://doi.org/10.1145/3127024.3127027","url":null,"abstract":"High-Performance Computing (HPC) applications have historically executed in static resource allocations, using programming models that ran independently from the resident system management stack (SMS). Achieving exascale performance that is both cost-effective and fits within site-level environmental constraints will, however, require that the application and SMS collaboratively orchestrate the flow of work to optimize resource utilization and compensate for on-the-fly faults. The Process Management Interface - Exascale (PMIx) community is committed to establishing scalable workflow orchestration by defining an abstract set of interfaces by which not only applications and tools can interact with the resident SMS, but also the various SMS components can interact with each other. This paper presents a high-level overview of the goals and current state of the PMIx standard, and lays out a roadmap for future directions.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124207114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Characterizing MPI matching via trace-based simulation 通过基于轨迹的仿真表征MPI匹配
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127040
Kurt B. Ferreira, Scott Levy, K. Pedretti, Ryan E. Grant
{"title":"Characterizing MPI matching via trace-based simulation","authors":"Kurt B. Ferreira, Scott Levy, K. Pedretti, Ryan E. Grant","doi":"10.1145/3127024.3127040","DOIUrl":"https://doi.org/10.1145/3127024.3127040","url":null,"abstract":"With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application's execution, including its matching performance, and is highly dependent on the MPI library's matchlist implementation. In this paper, we introduce a trace-based simulation approach to obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Enhanced memory management for scalable MPI intra-node communication on many-core processor 增强了多核处理器上可扩展MPI节点内通信的内存管理
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127035
Joong-Yeon Cho, Hyun-Wook Jin, Dukyun Nam
{"title":"Enhanced memory management for scalable MPI intra-node communication on many-core processor","authors":"Joong-Yeon Cho, Hyun-Wook Jin, Dukyun Nam","doi":"10.1145/3127024.3127035","DOIUrl":"https://doi.org/10.1145/3127024.3127035","url":null,"abstract":"As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intra-node communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Planning for performance: persistent collective operations for MPI 绩效计划:MPI的持续集体操作
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127028
B. Morgan, Daniel J. Holmes, A. Skjellum, P. Bangalore, Srinivas Sridharan
{"title":"Planning for performance: persistent collective operations for MPI","authors":"B. Morgan, Daniel J. Holmes, A. Skjellum, P. Bangalore, Srinivas Sridharan","doi":"10.1145/3127024.3127028","DOIUrl":"https://doi.org/10.1145/3127024.3127028","url":null,"abstract":"Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, more optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a limited form of point-to-point persistence (aka half-channels) standardized since MPI-1. This paper covers the design, prototype implementation of LibPNBC (based on LibNBC), and MPI-4 standardization status of persistent nonblocking collective operations. We provide early performance results, using a modified version of NBCBench and an example illustrating the potential performance enhancements for such operations. Persistent operations allow MPI implementations to make intelligent choices about algorithm and resource utilization once and amortize this decision cost across many uses in a long-running program. Evidence that this approach is of value is provided. As with non-persistent, nonblocking collective operations, the requirement for strong progress and blocking completion notification are jointly needed to maximize the benefit of such operations (e.g., overlap of communication with computation or other communication). Further enhancement of the current implementation prototype as well as additional opportunities to enhance performance through the application of these new APIs comprise future work.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122393700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Transforming blocking MPI collectives to Non-blocking and persistent operations 将阻塞的MPI集合转换为非阻塞和持久的操作
Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127033
H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer
{"title":"Transforming blocking MPI collectives to Non-blocking and persistent operations","authors":"H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer","doi":"10.1145/3127024.3127033","DOIUrl":"https://doi.org/10.1145/3127024.3127033","url":null,"abstract":"This paper describes Petal, a prototype tool that uses compiler-analysis techniques to automate code transformations to hide communication costs behind computation by replacing blocking MPI functions with corresponding nonblocking and persistent collective operations while maintaining legacy applications' correctness. In earlier work, we have already demonstrated Petal's ability to transform point-to-point MPI operations in complement to the results shown here. The contributions of this paper include the approach to collective operation transformations, a description of achieved functionality, examples of transformations, and demonstration of performance improvements obtained thus far on representative sample MPI programs. Depending on system scale and problem size, the transformations yield a speedup of up to a factor of two. This tool can be used to transform useful classes of new and legacy MPI programs to use the newest variants of MPI functions to improve performance without manual intervention for forthcoming HPC systems and updated versions of the MPI standard.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127564166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信