2014 Workshop on Exascale MPI at Supercomputing Conference最新文献

筛选
英文 中文
Simplifying the Recovery Model of User-Level Failure Mitigation 简化用户级故障缓解的恢复模型
2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.4
Wesley Bland, Kenneth Raffenetti, P. Balaji
{"title":"Simplifying the Recovery Model of User-Level Failure Mitigation","authors":"Wesley Bland, Kenneth Raffenetti, P. Balaji","doi":"10.1109/ExaMPI.2014.4","DOIUrl":"https://doi.org/10.1109/ExaMPI.2014.4","url":null,"abstract":"As resilience research in high-performance computing has matured, so too have the tools, libraries, and languages that result from it. The Message Passing Interface (MPI) Forum is considering the addition of fault tolerance to a future version of the MPI standard, and a new chapter called User-Level Failure Mitigation (ULFM) has been proposed to fill this need. However, as ULFM usage has become more widespread, many potential users are concerned about its complexity and the need to rewrite existing codes. In this paper, we present a usage model that is similar to the usage already common in existing codes but that does not require the user to restart the application (thereby incurring the costs of re-entering the batch queue, startup costs, etc.). We use a new implementation of ULFM in MPICH, a popular open source MPI implementation, and demonstrate the ULFM usage using the Monte Carlo Communication Kernel, a proxy-app developed by the Center for Exascale Simulation of Advanced Reactors. Results show that the approach used incurs a level of intrusiveness into the code similar to that of existing checkpoint/restart models, but with less overhead.","PeriodicalId":425070,"journal":{"name":"2014 Workshop on Exascale MPI at Supercomputing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128913109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications 混合MPI+X应用协同调度工作和通信任务的早期经验
2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.6
Dylan T. Stark, R. Barrett, Ryan E. Grant, Stephen L. Olivier, K. Pedretti, C. Vaughan
{"title":"Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications","authors":"Dylan T. Stark, R. Barrett, Ryan E. Grant, Stephen L. Olivier, K. Pedretti, C. Vaughan","doi":"10.1109/ExaMPI.2014.6","DOIUrl":"https://doi.org/10.1109/ExaMPI.2014.6","url":null,"abstract":"Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.","PeriodicalId":425070,"journal":{"name":"2014 Workshop on Exascale MPI at Supercomputing Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125394976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
To INT_MAX... and Beyond! Exploring Large-Count Support in MPI INT_MAX……和超越!探索MPI中的大计数支持
2014 Workshop on Exascale MPI at Supercomputing Conference Pub Date : 2014-11-16 DOI: 10.1109/ExaMPI.2014.5
J. Hammond, Andreas Schäfer, R. Latham
{"title":"To INT_MAX... and Beyond! Exploring Large-Count Support in MPI","authors":"J. Hammond, Andreas Schäfer, R. Latham","doi":"10.1109/ExaMPI.2014.5","DOIUrl":"https://doi.org/10.1109/ExaMPI.2014.5","url":null,"abstract":"In order to describe a structured region of memory, the routines in the MPI standard use a (count, datatype) pair. The C specification for this convention uses an int type for the count. Since C int types are nearly always 32 bits large and signed, counting more than 231 elements poses a challenge. Instead of changing the existing MPI routines, and all consumers of those routines, the MPI Forum asserts that users can build up large datatypes from smaller types. To evaluate this hypothesis and to provide a user-friendly solution to the large-count issue, we have developed BigMPI, a library on top of MPI that maps large-count MPI-like functions to MPI-3 standard features. BigMPI demonstrates a way to perform such a construction, reveals shortcomings of the MPI standard, and uncovers bugs in MPI implementations.","PeriodicalId":425070,"journal":{"name":"2014 Workshop on Exascale MPI at Supercomputing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125816935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信