Proceedings of the 23rd European MPI Users' Group Meeting最新文献_第2页

MPI usage at NERSC: Present and Future MPI在NERSC的使用:现在和未来

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966894

A. Koniges, B. Cook, J. Deslippe, T. Kurth, H. Shan

{"title":"MPI usage at NERSC: Present and Future","authors":"A. Koniges, B. Cook, J. Deslippe, T. Kurth, H. Shan","doi":"10.1145/2966884.2966894","DOIUrl":"https://doi.org/10.1145/2966884.2966894","url":null,"abstract":"In this poster, we describe how MPI is used at the National Energy Research Scientific Computing Center (NERSC) NERSC is the production high-performance computing center for the US Department of Energy, with more than 5000 users and 800 distinct projects. Through a variety of tools (e.g., User Survey, application team collaborations, etc.), we determine how MPI is used on our latest systems, with a particular focus on advanced features and how early applications intend to use MPI on NERSC's upcoming Intel Knights Landing (KNL) many-core system1 - one of the first to be deployed. In the poster, we also compare the usage of MPI to exascale developmental programming models such as UPC++ and HPX, with an eye on what features and extensions to MPI are plausible and useful for NERSC users. We also discuss perceived shortcomings of MPI, and why certain groups use other parallel programming models on the systems. In addition to a broad survey of the NERSC HPC population, we follow the evolution of a few key application codes2 that are being highly optimized for the KNL architecture using advanced OpenMP techniques. We study how these highly optimized on-node proxy apps and full applications start to make the transition to using full hybrid MPI+OpenMP implementations on the self-hosted KNL system.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Introducing Task-Containers as an Alternative to Runtime-Stacking 引入任务容器作为运行时堆栈的替代方案

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966910

Jean-Baptiste Besnard, Julien Adam, S. Shende, Marc Pérache, Patrick Carribault, Julien Jaeger

引用次数: 10

Performance comparison of Eulerian kinetic Vlasov code between flat-MPI parallelism and hybrid parallelism on Fujitsu FX100 supercomputer Fujitsu FX100超级计算机上平动并行和混合并行欧拉动力学Vlasov码性能比较

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966891

T. Umeda, K. Fukazawa

{"title":"Performance comparison of Eulerian kinetic Vlasov code between flat-MPI parallelism and hybrid parallelism on Fujitsu FX100 supercomputer","authors":"T. Umeda, K. Fukazawa","doi":"10.1145/2966884.2966891","DOIUrl":"https://doi.org/10.1145/2966884.2966891","url":null,"abstract":"The present study deals with the Vlasov simulation code, which solves the first-principle kinetic equations called the Vlasov equation for space plasma. In the present study, a five-dimensional Vlasov code with two spatial dimension and three velocity dimensions is parallelized with two methods, the flat-MPI and the MPI-OpenMP hybrid parallelism. The two types of the parallel Vlasov code are benchmarked on massively-parallel supercomputer Fujitsu FX100, which has been developed with the second-generation post architecture of the K computer in Japan. In the present performance comparison, we vary the number of threads per nodes from 1 (the flat-MPI parallelism) to 32. The result shows that the OpenMP-MPI hybrid parallelism outperforms the flat-MPI for any number of compute nodes. There is an optimum number of threads per nodes depending on the number of compute nodes. It is shown that the optimum number of threads per node becomes larger on a larger number of compute nodes. This is because the communication time of an MPI collective communication subroutine, which is used for convergence check of iterative methods, can be reduced by decreasing the total number of processes with the OpenMP-MPI hybrid parallelism.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"35 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126333810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Space Performance Tradeoffs in Compressing MPI Group Data Structures 压缩MPI组数据结构的空间性能权衡

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966911

Sameer Kumar, P. Heidelberger, C. Stunkel

引用次数: 1

Infrastructure and API Extensions for Elastic Execution of MPI Applications 弹性执行MPI应用程序的基础设施和API扩展

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966917

Isaías A. Comprés Ureña, Ao Mo-Hellenbrand, M. Gerndt, H. Bungartz

引用次数: 36

Distributed Memory Implementation Strategies for the kinetic Monte Carlo Algorithm 动态蒙特卡罗算法的分布式内存实现策略

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966908

António Esteves, Alfredo Moura

{"title":"Distributed Memory Implementation Strategies for the kinetic Monte Carlo Algorithm","authors":"António Esteves, Alfredo Moura","doi":"10.1145/2966884.2966908","DOIUrl":"https://doi.org/10.1145/2966884.2966908","url":null,"abstract":"This paper presents strategies to parallelize a previously implemented kinetic Monte Carlo (kMC) algorithm. The process under simulation is the precipitation in an aluminum scandium alloy. The selected parallel algorithm is called synchronous parallel kinetic Monte Carlo (spkMC). spkMC was implemented with a distributed memory architecture and using the Message Passing Interface (MPI) communication protocol. In spkMC the different processes synchronize at regular points, called end of sprint. During a sprint there is no interaction among processes. A checker board scheme was adopted to avoid possible conflicts among processes during each sprint. To optimize performance different implementations were explored, each one with a different computation vs. communication strategy. The obtained results prove that a rigorous distributed and parallel implementation reproduces accurately the statistical behavior observed with the sequential kMC. Results also prove that simulation time can be reduced with a distributed parallelization but, due to the non-deterministic nature of kMC, significant and scalable gains in parallelization oblige to introduce some simplifications and approximations.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133176768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms 我是如何学会停止担忧和热爱原位分析的:利用MPI集体算法中的潜在同步

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966920

Scott Levy, Kurt B. Ferreira, Patrick M. Widener, P. Bridges, Oscar H. Mondragon

{"title":"How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms","authors":"Scott Levy, Kurt B. Ferreira, Patrick M. Widener, P. Bridges, Oscar H. Mondragon","doi":"10.1145/2966884.2966920","DOIUrl":"https://doi.org/10.1145/2966884.2966920","url":null,"abstract":"Scientific workloads running on current extreme-scale systems routinely generate tremendous volumes of data for postprocessing. This data movement has become a serious issue due to its energy cost and the fact that I/O bandwidths have not kept pace with data generation rates. In situ analytics is an increasingly popular alternative in which post-simulation processing is embedded into an application, running as part of the same MPI job. This can reduce data movement costs but introduces a new potential source of interference for the application. Using a validated simulation-based approach, we investigate how best to mitigate the interference from time-shared in situ tasks for a number of key extreme-scale workloads. This paper makes a number of contributions. First, we show that the independent scheduling of in situ analytics tasks can significantly degradation application performance, with slowdowns exceeding 1000%. Second, we demonstrate that the degree of synchronization found in many modern collective algorithms is sufficient to significantly reduce the overheads of this interference to less than 10% in most cases. Finally, we show that many applications already frequently invoke collective operations that use these synchronizing MPI algorithms. Therefore, the syncronization introduced by these MPI collective algorithms can be leveraged to efficiently schedule analytics tasks with minimal changes to existing applications. This paper provides critical analysis and guidance for MPI users and developers on the importance of scheduling in situ analytics tasks. It shows the degree of synchronization needed to mitigate the performance impacts of these time-shared coupled codes and demonstrates how that synchronization can be realized in an extreme-scale environment using modern collective algorithms.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Library for Advanced Datatype Programming 高级数据类型编程库

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966904

J. Träff

引用次数: 9

The Potential of Diffusive Load Balancing at Large Scale 大规模扩散负载平衡的潜力

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966887

Matthias Lieber, Kerstin Gößner, W. Nagel

引用次数: 10

Proceedings of the 23rd European MPI Users' Group Meeting 第23届欧洲MPI用户组会议记录

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884

J. Dongarra, Daniel Holmes, A. Collis, J. Träff, Lorna Smith

引用次数: 0