Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.131

I. Ivanov, Jing Gong, D. Akhmetova, I. Peng, S. Markidis, E. Laure, Rui Machado, M. Rahn, Valeria Bartsch, A. Hart, P. Fischer

{"title":"Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application","authors":"I. Ivanov, Jing Gong, D. Akhmetova, I. Peng, S. Markidis, E. Laure, Rui Machado, M. Rahn, Valeria Bartsch, A. Hart, P. Fischer","doi":"10.1109/CLUSTER.2015.131","DOIUrl":null,"url":null,"abstract":"Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.

查看原文本刊更多论文

Nek5000微型应用程序Nekbone中并行通信模型的评估

Nekbone是Nek5000的代理应用程序，Nek5000是一种可扩展的计算流体动力学(CFD)代码，用于模拟不可压缩流动。Nekbone迷你应用程序被几个国际协同设计中心用于探索计算机科学中的新概念并评估其性能。为了研究不同并行通信模型的性能，我们设计并实现了一个新的通信内核。首先，开发了一种新的MPI阻塞通信内核，以解决三维笛卡尔网格和过程拓扑中的Nekbone问题。与原来的实现相比，新的MPI实现提供了13%的性能提升。新的MPI通信内核由大约500行代码组成，而不是原来的7000行代码，允许在Nekbone并行通信中尝试新的方法。其次，将新内核中的MPI阻塞通信改为MPI非阻塞通信。第三，我们基于GPI-2库开发了一个新的分区全局地址空间通信内核。这种方法减少了相邻进程之间的同步，并且比新的基于mpi的非阻塞方法平均快3%。在我们对8,192个进程的测试中，GPI-2通信内核比新的MPI非阻塞通信内核快3%。此外，我们在新通信内核的所有版本中都使用了OpenMP。最后，我们重点介绍在父应用程序Nek5000中使用新通信内核的后续步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量