2014 Workshop on Exascale MPI at Supercomputing Conference最新文献

筛选

英文中文

Simplifying the Recovery Model of User-Level Failure Mitigation 简化用户级故障缓解的恢复模型

2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.4

Wesley Bland, Kenneth Raffenetti, P. Balaji

{"title":"Simplifying the Recovery Model of User-Level Failure Mitigation","authors":"Wesley Bland, Kenneth Raffenetti, P. Balaji","doi":"10.1109/ExaMPI.2014.4","DOIUrl":"https://doi.org/10.1109/ExaMPI.2014.4","url":null,"abstract":"As resilience research in high-performance computing has matured, so too have the tools, libraries, and languages that result from it. The Message Passing Interface (MPI) Forum is considering the addition of fault tolerance to a future version of the MPI standard, and a new chapter called User-Level Failure Mitigation (ULFM) has been proposed to fill this need. However, as ULFM usage has become more widespread, many potential users are concerned about its complexity and the need to rewrite existing codes. In this paper, we present a usage model that is similar to the usage already common in existing codes but that does not require the user to restart the application (thereby incurring the costs of re-entering the batch queue, startup costs, etc.). We use a new implementation of ULFM in MPICH, a popular open source MPI implementation, and demonstrate the ULFM usage using the Monte Carlo Communication Kernel, a proxy-app developed by the Center for Exascale Simulation of Advanced Reactors. Results show that the approach used incurs a level of intrusiveness into the code similar to that of existing checkpoint/restart models, but with less overhead.","PeriodicalId":425070,"journal":{"name":"2014 Workshop on Exascale MPI at Supercomputing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128913109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications 混合MPI+X应用协同调度工作和通信任务的早期经验

2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.6

Dylan T. Stark, R. Barrett, Ryan E. Grant, Stephen L. Olivier, K. Pedretti, C. Vaughan

{"title":"Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications","authors":"Dylan T. Stark, R. Barrett, Ryan E. Grant, Stephen L. Olivier, K. Pedretti, C. Vaughan","doi":"10.1109/ExaMPI.2014.6","DOIUrl":"https://doi.org/10.1109/ExaMPI.2014.6","url":null,"abstract":"Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.","PeriodicalId":425070,"journal":{"name":"2014 Workshop on Exascale MPI at Supercomputing Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125394976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

To INT_MAX... and Beyond! Exploring Large-Count Support in MPI INT_MAX……和超越!探索MPI中的大计数支持

2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.5

J. Hammond, Andreas Schäfer, R. Latham

引用次数: 18