Proceedings of the 22nd European MPI Users' Group Meeting最新文献_第2页

Sliding Substitution of Failed Nodes 失效节点滑动替换

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802670

A. Hori, Kazumi Yoshinaga, T. Hérault, Aurélien Bouteiller, G. Bosilca, Y. Ishikawa

{"title":"Sliding Substitution of Failed Nodes","authors":"A. Hori, Kazumi Yoshinaga, T. Hérault, Aurélien Bouteiller, G. Bosilca, Y. Ishikawa","doi":"10.1145/2802658.2802670","DOIUrl":"https://doi.org/10.1145/2802658.2802670","url":null,"abstract":"This paper considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the node- rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this paper, several spare-node allocation and nodesubstitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. It will be shown that when a failure occurs, the peer-to-peer (P2P) communication performance on the K computer can be slowed by a factor of three and collective performance can be cut in half. On BG/Q, P2P performance can be slowed by a factor of five and collective performance can be slowed by a factor of ten. However, those numbers can be reduced by using an appropriate substitution method.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116523598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Memory Management System Optimized for BDMPI's Memory and Execution Model 一个针对BDMPI的内存和执行模型进行优化的内存管理系统

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802666

J. Iverson, G. Karypis

{"title":"A Memory Management System Optimized for BDMPI's Memory and Execution Model","authors":"J. Iverson, G. Karypis","doi":"10.1145/2802658.2802666","DOIUrl":"https://doi.org/10.1145/2802658.2802666","url":null,"abstract":"There is a growing need to perform large computations on small systems, as access to large systems is not widely available and cannot keep up with the scaling of data. BDMPI was recently introduced as a way of achieving this for applications written in MPI. BDMPI allows the efficient execution of standard MPI programs on systems whose aggregate amount of memory is smaller than that required by the computations and significantly outperforms other approaches. In this paper we present a virtual memory subsystem which we implemented as part of the BDMPI runtime. Our new virtual memory subsystem, which we call SBMA, bypasses the operating system virtual memory manager to take advantage of BDMPI's node-level cooperative multi-taking. Benchmarking using a synthetic application shows that for the use cases relevant to BDMPI, the overhead incurred by the BDMPI-SBMA system is amortized such that it performs as fast as explicit data movement by the application developer. Furthermore, we tested SBMA with three different classes of applications and our results show that with no modification to the original MPI program, speedups from 2×--12× over a standard BDMPI implementation can be achieved for the included applications.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132353643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Parallel Algorithm for Minimizing the Fleet Size in the Pickup and Delivery Problem with Time Windows 带时间窗口的取货问题中最小化车队规模的并行算法

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802673

Miroslaw Blocho, J. Nalepa

引用次数: 11

Scalable and Fault Tolerant Failure Detection and Consensus 可伸缩和容错故障检测和共识

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802660

Amogh Katti, G. D. Fatta, T. Naughton, C. Engelmann

引用次数: 23

GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks 非阻塞集体基准的gpu感知设计，实现和评估

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802672

A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, H. Subramoni, D. Panda

{"title":"GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks","authors":"A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, H. Subramoni, D. Panda","doi":"10.1145/2802658.2802672","DOIUrl":"https://doi.org/10.1145/2802658.2802672","url":null,"abstract":"As we move towards efficient exascale systems, heterogeneous accelerators like NVIDIA GPUs are becoming a significant compute component of modern HPC clusters. It has become important to utilize every single cycle of every compute device available in the system. From NICs to GPUs to Co-processors, heterogeneous compute resources are the way to move forward. Another important trend, especially with the introduction of non-blocking collective communication in the latest MPI standard, is overlapping communication with computation. It has become an important design goal for messaging libraries like MVAPICH2 and OpenMPI. In this paper, we present an important benchmark that allows the users of different MPI libraries to evaluate performance of GPU-Aware Non-Blocking Collectives. The main performance metrics are overlap and latency. We provide insights on designing a GPU-Aware benchmark and discuss the challenges associated with identifying and implementing performance parameters like overlap, latency, effect of MPI_Test() calls to progress communication, effect of independent GPU communication while the overlapped computation proceeds under the communication, and the effect of complexity, target, and scale of this overlapped computation. To illustrate the efficacy of the proposed benchmark, we provide a comparative performance evaluation of GPU-Aware Non-Blocking Collectives in MVAPICH2 and OpenMPI.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126899411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Detecting Silent Data Corruption for Extreme-Scale MPI Applications 检测无声数据损坏的极端规模的MPI应用程序

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802665

L. Bautista-Gomez, F. Cappello

{"title":"Detecting Silent Data Corruption for Extreme-Scale MPI Applications","authors":"L. Bautista-Gomez, F. Cappello","doi":"10.1145/2802658.2802665","DOIUrl":"https://doi.org/10.1145/2802658.2802665","url":null,"abstract":"Next-generation supercomputers are expected to have more components and, at the same time, consume several times less energy per operation. These trends are pushing supercomputer construction to the limits of miniaturization and energy-saving strategies. Consequently, the number of soft errors is expected to increase dramatically in the coming years. While mechanisms are in place to correct or at least detect some soft errors, a significant percentage of those errors pass unnoticed by the hardware. Such silent errors are extremely damaging because they can make applications silently produce wrong results. In this work we propose a technique that leverages certain properties of high-performance computing applications in order to detect silent errors at the application level. Our technique detects corruption based solely on the behavior of the application datasets and is application-agnostic. We propose multiple corruption detectors, and we couple them to work together in a fashion transparent to the user. We demonstrate that this strategy can detect over 80% of corruptions, while incurring less than 1% of overhead. We show that the false positive rate is less than 1% and that when multi-bit corruptions are taken into account, the detection recall increases to over 95%.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114497635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

MPI-focused Tracing with OTFX: An MPI-aware In-memory Event Tracing Extension to the Open Trace Format 2 以OTFX为中心的mpi跟踪:开放跟踪格式的mpi感知内存事件跟踪扩展

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802664

M. Wagner, J. Doleschal, A. Knüpfer

{"title":"MPI-focused Tracing with OTFX: An MPI-aware In-memory Event Tracing Extension to the Open Trace Format 2","authors":"M. Wagner, J. Doleschal, A. Knüpfer","doi":"10.1145/2802658.2802664","DOIUrl":"https://doi.org/10.1145/2802658.2802664","url":null,"abstract":"Performance analysis tools are more than ever inevitable to develop applications that utilize the enormous computing resources of high performance computing (HPC) systems. In event-based performance analysis the amount of collected data is one of the most urgent challenges. The resulting measurement bias caused by uncoordinated intermediate memory buffer flushes in the monitoring tool can render a meaningful analysis of the parallel behavior impossible. In this paper we address the impact of intermediate memory buffer flushes and present a method to avoid file interaction in the monitoring tool entirely. We propose an MPI-focused tracing approach that provides the complete MPI communication behavior and adapts the remaining application events to an amount that fits into a single memory buffer. We demonstrate the capabilities of our method with an MPI-focused prototype implementation of OTFX, based on the Open Trace Format 2, a state-of-the-art Open Source event tracing library used by the performance analysis tools Vampir, Scalasca, and Tau. In a comparison to OTF2 based on seven applications from different scientific domains, our prototype introduces in average 5.1% less overhead and reduces the trace size up to three orders of magnitude.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122276450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Toward Operating System Support for Scalable Multithreaded Message Passing 操作系统对可伸缩多线程消息传递的支持

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802661

Balazs Gerofi, Masamichi Takagi, Y. Ishikawa

{"title":"Toward Operating System Support for Scalable Multithreaded Message Passing","authors":"Balazs Gerofi, Masamichi Takagi, Y. Ishikawa","doi":"10.1145/2802658.2802661","DOIUrl":"https://doi.org/10.1145/2802658.2802661","url":null,"abstract":"Modern CPU architectures provide a large number of processing cores and application programmers are increasingly looking at hybrid programming models, where multiple threads of a single process interact with the MPI library simultaneously. Moreover, recent high-speed interconnection networks are being designed with capabilities targeting communication explicitly from multiple processor cores. As a result, scalability of the MPI library so that multithreaded applications can efficiently drive independent network communication has become a major concern. In this work, we propose a novel operating system level concept called the thread private shared library (TPSL), which enables threads of a multithreaded application to see specific shared libraries in a private fashion. Contrary to address spaces in traditional operating systems, where threads of a single process refer to the exact same set of virtual to physical mappings, our technique relies on per-thread separate page tables. Mapping the MPI library in a thread private fashion results in per-thread MPI ranks eliminating resource contention in the MPI library without the need for redesigning it. To demonstrate the benefits of our mechanism, we provide preliminary evaluation for various aspects of multithreaded MPI processing through micro-benchmarks on two widely used MPI implementations, MPICH and MVAPICH, with only minor modifications to the libraries.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126945905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An MPI Halo-Cell Implementation for Zero-Copy Abstraction 零拷贝抽象的MPI光晕单元实现

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 2015-09-21 DOI: 10.1145/2802658.2802669

Jean-Baptiste Besnard, A. Malony, S. Shende, Marc Pérache, Patrick Carribault, Julien Jaeger

{"title":"An MPI Halo-Cell Implementation for Zero-Copy Abstraction","authors":"Jean-Baptiste Besnard, A. Malony, S. Shende, Marc Pérache, Patrick Carribault, Julien Jaeger","doi":"10.1145/2802658.2802669","DOIUrl":"https://doi.org/10.1145/2802658.2802669","url":null,"abstract":"In the race for Exascale, the advent of many-core processors will bring a shift in parallel computing architectures to systems of much higher concurrency, but with a relatively smaller memory per thread. This shift raises concerns for the adaptability of HPC software, for the current generation to the brave new world. In this paper, we study domain splitting on an increasing number of memory areas as an example problem where negative performance impact on computation could arise. We identify the specific parameters that drive scalability for this problem, and then model the halo-cell ratio on common mesh topologies to study the memory and communication implications. Such analysis argues for the use of shared-memory parallelism, such as with OpenMP, to address the performance problems that could occur. In contrast, we propose an original solution based entirely on MPI programming semantics, while providing the performance advantages of hybrid parallel programming. Our solution transparently replaces halo-cells transfers with pointer exchanges when MPI tasks are running on the same node, effectively removing memory copies. The results we present demonstrate gains in terms of memory and computation time on Xeon Phi (compared to OpenMP-only and MPI-only) using a representative domain decomposition benchmark.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133130107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Proceedings of the 22nd European MPI Users' Group Meeting 第22届欧洲MPI用户组会议论文集

Proceedings of the 22nd European MPI Users' Group Meeting Pub Date : 1900-01-01 DOI: 10.1145/2802658

引用次数: 0