Proceedings of the 24th European MPI Users' Group Meeting最新文献

A hierarchical model to manage hardware topology in MPI applications 在MPI应用程序中管理硬件拓扑的分层模型

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127030

E. Jeannot, Farouk Mansouri, Guillaume Mercier

引用次数: 1

Offloaded MPI persistent collectives using persistent generalized request interface 使用持久通用请求接口卸载MPI持久集合

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127029

M. Hatanaka, Masamichi Takagi, A. Hori, Y. Ishikawa

{"title":"Offloaded MPI persistent collectives using persistent generalized request interface","authors":"M. Hatanaka, Masamichi Takagi, A. Hori, Y. Ishikawa","doi":"10.1145/3127024.3127029","DOIUrl":"https://doi.org/10.1145/3127024.3127029","url":null,"abstract":"This paper proposes a library with a persistent generalized request interface for the implementation of persistent communication operations. This interface allows developers to add persistent communication functions to the existing MPI library. We implemented a new generalized request interface which supports persistent communications because the generalized requests of the MPI standard lacks the features needed for persistent communications. We evaluate the expressiveness of the interface by developing five implementations of a persistent collective operation, namely, MPI_Neighbor_-alltoall_init: one utilizes the collective offload capability of Fujitsu FX100 Tofu2 interconnect and other four utilize the standard MPI functions and the Fujitsu-extended MPI functions. These implementations are evaluated on FX100 with micro-benchmark programs measuring latency. The results show that the offloaded version outperforms the existing implementations by more than a factor of two with data sizes up to 16 KiB, confirming that the proposed library interface facilitates the development of persistent collectives and the offloaded implementation exhibits the expected performance.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115197176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Notified access in coarray fortran coarray fortran中的通知访问

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127026

A. Fanfarillo, D. D. Vento

引用次数: 4

Enabling hierarchy-aware MPI collectives in dynamically changing topologies 在动态变化的拓扑中启用层次结构感知的MPI集合

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127031

Simon Pickartz, Carsten Clauss, Stefan Lankes, A. Monti

引用次数: 4

Proceedings of the 24th European MPI Users' Group Meeting 第24届欧洲MPI用户组会议论文集

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024

Antonio J. Peña, P. Balaji, W. Gropp, R. Thakur

引用次数: 2

PMIx: process management for exascale environments PMIx:百亿亿级环境的进程管理

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127027

R. Castain, David G. Solt, Joshua Hursey, Aurélien Bouteiller

引用次数: 53

Characterizing MPI matching via trace-based simulation 通过基于轨迹的仿真表征MPI匹配

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127040

Kurt B. Ferreira, Scott Levy, K. Pedretti, Ryan E. Grant

{"title":"Characterizing MPI matching via trace-based simulation","authors":"Kurt B. Ferreira, Scott Levy, K. Pedretti, Ryan E. Grant","doi":"10.1145/3127024.3127040","DOIUrl":"https://doi.org/10.1145/3127024.3127040","url":null,"abstract":"With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application's execution, including its matching performance, and is highly dependent on the MPI library's matchlist implementation. In this paper, we introduce a trace-based simulation approach to obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Enhanced memory management for scalable MPI intra-node communication on many-core processor 增强了多核处理器上可扩展MPI节点内通信的内存管理

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127035

Joong-Yeon Cho, Hyun-Wook Jin, Dukyun Nam

{"title":"Enhanced memory management for scalable MPI intra-node communication on many-core processor","authors":"Joong-Yeon Cho, Hyun-Wook Jin, Dukyun Nam","doi":"10.1145/3127024.3127035","DOIUrl":"https://doi.org/10.1145/3127024.3127035","url":null,"abstract":"As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intra-node communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129545075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Planning for performance: persistent collective operations for MPI 绩效计划:MPI的持续集体操作

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127028

B. Morgan, Daniel J. Holmes, A. Skjellum, P. Bangalore, Srinivas Sridharan

{"title":"Planning for performance: persistent collective operations for MPI","authors":"B. Morgan, Daniel J. Holmes, A. Skjellum, P. Bangalore, Srinivas Sridharan","doi":"10.1145/3127024.3127028","DOIUrl":"https://doi.org/10.1145/3127024.3127028","url":null,"abstract":"Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, more optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a limited form of point-to-point persistence (aka half-channels) standardized since MPI-1. This paper covers the design, prototype implementation of LibPNBC (based on LibNBC), and MPI-4 standardization status of persistent nonblocking collective operations. We provide early performance results, using a modified version of NBCBench and an example illustrating the potential performance enhancements for such operations. Persistent operations allow MPI implementations to make intelligent choices about algorithm and resource utilization once and amortize this decision cost across many uses in a long-running program. Evidence that this approach is of value is provided. As with non-persistent, nonblocking collective operations, the requirement for strong progress and blocking completion notification are jointly needed to maximize the benefit of such operations (e.g., overlap of communication with computation or other communication). Further enhancement of the current implementation prototype as well as additional opportunities to enhance performance through the application of these new APIs comprise future work.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122393700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Transforming blocking MPI collectives to Non-blocking and persistent operations 将阻塞的MPI集合转换为非阻塞和持久的操作

Proceedings of the 24th European MPI Users' Group Meeting Pub Date : 2017-09-25 DOI: 10.1145/3127024.3127033

H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer

引用次数: 9