Proceedings of the 23rd European MPI Users' Group Meeting最新文献

筛选
英文 中文
MPI usage at NERSC: Present and Future MPI在NERSC的使用:现在和未来
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966894
A. Koniges, B. Cook, J. Deslippe, T. Kurth, H. Shan
{"title":"MPI usage at NERSC: Present and Future","authors":"A. Koniges, B. Cook, J. Deslippe, T. Kurth, H. Shan","doi":"10.1145/2966884.2966894","DOIUrl":"https://doi.org/10.1145/2966884.2966894","url":null,"abstract":"In this poster, we describe how MPI is used at the National Energy Research Scientific Computing Center (NERSC) NERSC is the production high-performance computing center for the US Department of Energy, with more than 5000 users and 800 distinct projects. Through a variety of tools (e.g., User Survey, application team collaborations, etc.), we determine how MPI is used on our latest systems, with a particular focus on advanced features and how early applications intend to use MPI on NERSC's upcoming Intel Knights Landing (KNL) many-core system1 - one of the first to be deployed. In the poster, we also compare the usage of MPI to exascale developmental programming models such as UPC++ and HPX, with an eye on what features and extensions to MPI are plausible and useful for NERSC users. We also discuss perceived shortcomings of MPI, and why certain groups use other parallel programming models on the systems. In addition to a broad survey of the NERSC HPC population, we follow the evolution of a few key application codes2 that are being highly optimized for the KNL architecture using advanced OpenMP techniques. We study how these highly optimized on-node proxy apps and full applications start to make the transition to using full hybrid MPI+OpenMP implementations on the self-hosted KNL system.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Introducing Task-Containers as an Alternative to Runtime-Stacking 引入任务容器作为运行时堆栈的替代方案
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966910
Jean-Baptiste Besnard, Julien Adam, S. Shende, Marc Pérache, Patrick Carribault, Julien Jaeger
{"title":"Introducing Task-Containers as an Alternative to Runtime-Stacking","authors":"Jean-Baptiste Besnard, Julien Adam, S. Shende, Marc Pérache, Patrick Carribault, Julien Jaeger","doi":"10.1145/2966884.2966910","DOIUrl":"https://doi.org/10.1145/2966884.2966910","url":null,"abstract":"The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that MPI will have to evolve in order to exploit shared-memory parallelism, either by collaborating with other programming models (MPI+X) or by introducing new shared-memory approaches. This paper considers extensions to C and C++ to make it possible for MPI Processes to run into threads. More generally, a thread-local storage (TLS) library is developed to simplify the collocation of arbitrary tasks and services in a shared-memory context called a task-container. The paper discusses how such containers simplify model and service mixing at the OS process level, eventually easing the collocation of arbitrary tasks with MPI processes in a runtime agnostic fashion, opening alternatives to runtime stacking.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"223 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116034978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Performance comparison of Eulerian kinetic Vlasov code between flat-MPI parallelism and hybrid parallelism on Fujitsu FX100 supercomputer Fujitsu FX100超级计算机上平动并行和混合并行欧拉动力学Vlasov码性能比较
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966891
T. Umeda, K. Fukazawa
{"title":"Performance comparison of Eulerian kinetic Vlasov code between flat-MPI parallelism and hybrid parallelism on Fujitsu FX100 supercomputer","authors":"T. Umeda, K. Fukazawa","doi":"10.1145/2966884.2966891","DOIUrl":"https://doi.org/10.1145/2966884.2966891","url":null,"abstract":"The present study deals with the Vlasov simulation code, which solves the first-principle kinetic equations called the Vlasov equation for space plasma. In the present study, a five-dimensional Vlasov code with two spatial dimension and three velocity dimensions is parallelized with two methods, the flat-MPI and the MPI-OpenMP hybrid parallelism. The two types of the parallel Vlasov code are benchmarked on massively-parallel supercomputer Fujitsu FX100, which has been developed with the second-generation post architecture of the K computer in Japan. In the present performance comparison, we vary the number of threads per nodes from 1 (the flat-MPI parallelism) to 32. The result shows that the OpenMP-MPI hybrid parallelism outperforms the flat-MPI for any number of compute nodes. There is an optimum number of threads per nodes depending on the number of compute nodes. It is shown that the optimum number of threads per node becomes larger on a larger number of compute nodes. This is because the communication time of an MPI collective communication subroutine, which is used for convergence check of iterative methods, can be reduced by decreasing the total number of processes with the OpenMP-MPI hybrid parallelism.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"35 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126333810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Space Performance Tradeoffs in Compressing MPI Group Data Structures 压缩MPI组数据结构的空间性能权衡
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966911
Sameer Kumar, P. Heidelberger, C. Stunkel
{"title":"Space Performance Tradeoffs in Compressing MPI Group Data Structures","authors":"Sameer Kumar, P. Heidelberger, C. Stunkel","doi":"10.1145/2966884.2966911","DOIUrl":"https://doi.org/10.1145/2966884.2966911","url":null,"abstract":"MPI is a popular programming paradigm on parallel machines today. MPI libraries sometimes use O(N) data structures to implement MPI functionality. The IBM Blue Gene/Q machine has 16 GB memory per node. If each node runs 32 MPI processes, only 512 MB is available per process, requiring the MPI library to be space efficient. This scenario will become severe in a future Exascale machine with tens of millions of cores and MPI endpoints. We explore techniques to compress the dense O(N) mapping data structures that map the logical process ID to the global rank. Our techniques minimize topological communicator mapping state by replacing table lookups with a mapping function. We also explore caching schemes with performance results to optimize overheads of the mapping functions for recent translations in multiple MPI micro-benchmarks, and the 3D FFT and Algebraic Multi Grid application benchmarks.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129727665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Infrastructure and API Extensions for Elastic Execution of MPI Applications 弹性执行MPI应用程序的基础设施和API扩展
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966917
Isaías A. Comprés Ureña, Ao Mo-Hellenbrand, M. Gerndt, H. Bungartz
{"title":"Infrastructure and API Extensions for Elastic Execution of MPI Applications","authors":"Isaías A. Comprés Ureña, Ao Mo-Hellenbrand, M. Gerndt, H. Bungartz","doi":"10.1145/2966884.2966917","DOIUrl":"https://doi.org/10.1145/2966884.2966917","url":null,"abstract":"Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129351777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Distributed Memory Implementation Strategies for the kinetic Monte Carlo Algorithm 动态蒙特卡罗算法的分布式内存实现策略
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966908
António Esteves, Alfredo Moura
{"title":"Distributed Memory Implementation Strategies for the kinetic Monte Carlo Algorithm","authors":"António Esteves, Alfredo Moura","doi":"10.1145/2966884.2966908","DOIUrl":"https://doi.org/10.1145/2966884.2966908","url":null,"abstract":"This paper presents strategies to parallelize a previously implemented kinetic Monte Carlo (kMC) algorithm. The process under simulation is the precipitation in an aluminum scandium alloy. The selected parallel algorithm is called synchronous parallel kinetic Monte Carlo (spkMC). spkMC was implemented with a distributed memory architecture and using the Message Passing Interface (MPI) communication protocol. In spkMC the different processes synchronize at regular points, called end of sprint. During a sprint there is no interaction among processes. A checker board scheme was adopted to avoid possible conflicts among processes during each sprint. To optimize performance different implementations were explored, each one with a different computation vs. communication strategy. The obtained results prove that a rigorous distributed and parallel implementation reproduces accurately the statistical behavior observed with the sequential kMC. Results also prove that simulation time can be reduced with a distributed parallelization but, due to the non-deterministic nature of kMC, significant and scalable gains in parallelization oblige to introduce some simplifications and approximations.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133176768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms 我是如何学会停止担忧和热爱原位分析的:利用MPI集体算法中的潜在同步
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966920
Scott Levy, Kurt B. Ferreira, Patrick M. Widener, P. Bridges, Oscar H. Mondragon
{"title":"How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms","authors":"Scott Levy, Kurt B. Ferreira, Patrick M. Widener, P. Bridges, Oscar H. Mondragon","doi":"10.1145/2966884.2966920","DOIUrl":"https://doi.org/10.1145/2966884.2966920","url":null,"abstract":"Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with data generation rates. In situ analytics is an increasingly popular alternative in which post-simulation processing is embedded into an application, running as part of the same MPI job. This can reduce data movement costs but introduces a new potential source of interference for the application. Using a validated simulation-based approach, we investigate how best to mitigate the interference from time-shared in situ tasks for a number of key extreme-scale workloads. This paper makes a number of contributions. First, we show that the independent scheduling of in situ analytics tasks can significantly degradation application performance, with slowdowns exceeding 1000%. Second, we demonstrate that the degree of synchronization found in many modern collective algorithms is sufficient to significantly reduce the overheads of this interference to less than 10% in most cases. Finally, we show that many applications already frequently invoke collective operations that use these synchronizing MPI algorithms. Therefore, the syncronization introduced by these MPI collective algorithms can be leveraged to efficiently schedule analytics tasks with minimal changes to existing applications. This paper provides critical analysis and guidance for MPI users and developers on the importance of scheduling in situ analytics tasks. It shows the degree of synchronization needed to mitigate the performance impacts of these time-shared coupled codes and demonstrates how that synchronization can be realized in an extreme-scale environment using modern collective algorithms.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Library for Advanced Datatype Programming 高级数据类型编程库
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966904
J. Träff
{"title":"A Library for Advanced Datatype Programming","authors":"J. Träff","doi":"10.1145/2966884.2966904","DOIUrl":"https://doi.org/10.1145/2966884.2966904","url":null,"abstract":"We present a library providing functionality beyond the MPI standard for manipulating application data layouts described by MPI derived datatypes. The main contributions are: a) Constructors for several, new datatypes for describing application relevant data layouts. b) A set of extent-free constructors that eliminate the need for type resizing. c) New navigation and query functionality for accessing individual data elements in layouts described by datatypes, and for comparing layouts. d) Representation of datatype signatures by explicit, associated signature types, as well as functionality for explicit generation of type maps. As a simple application, we implement reduction collectives on noncontiguous, but homogeneous derived datatypes. Some of the proposed functionality could be implemented more efficiently within an MPI library.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130906533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The Potential of Diffusive Load Balancing at Large Scale 大规模扩散负载平衡的潜力
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966887
Matthias Lieber, Kerstin Gößner, W. Nagel
{"title":"The Potential of Diffusive Load Balancing at Large Scale","authors":"Matthias Lieber, Kerstin Gößner, W. Nagel","doi":"10.1145/2966884.2966887","DOIUrl":"https://doi.org/10.1145/2966884.2966887","url":null,"abstract":"Dynamic load balancing with diffusive methods is known to provide minimal load transfer and requires communication between neighbor nodes only. These are very attractive properties for highly parallel systems. We compare diffusive methods with state-of-the-art geometrical and graph-based partitioning methods on thousands of nodes. When load balancing overheads, i.e. repartitioning computation time and migration, have to be minimized, diffusive methods provide substantial benefits.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132098519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Proceedings of the 23rd European MPI Users' Group Meeting 第23届欧洲MPI用户组会议记录
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884
J. Dongarra, Daniel Holmes, A. Collis, J. Träff, Lorna Smith
{"title":"Proceedings of the 23rd European MPI Users' Group Meeting","authors":"J. Dongarra, Daniel Holmes, A. Collis, J. Träff, Lorna Smith","doi":"10.1145/2966884","DOIUrl":"https://doi.org/10.1145/2966884","url":null,"abstract":"EuroMPI is the preeminent meeting for users, developers and researchers to interact and discuss new developments and applications of message-passing parallel computing, in particular in and related to the Message Passing Interface (MPI). The annual meeting has a long, rich tradition, and were held in Madrid (2013), Vienna (2012), Santorini (2011), Stuttgart (2010), Espoo (2009), Dublin (2008), Paris (2007), Bonn (2006), Sorrento (2005), Budapest (2004), Venice (2003), Linz (2002), Santorini (2001), Balatonfured (2000), Barcelona (1999), Liverpool (1998), Cracow (1997), Munich (1996), Lyon (1995), and Rome (1994).","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123142699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信