Proceedings of the 23rd European MPI Users' Group Meeting最新文献

FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO 平面波DFT码中的FFT数据分布。一个来自Quantum ESPRESSO的案例研究

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966892

F. Affinito, C. Cavazzoni

{"title":"FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO","authors":"F. Affinito, C. Cavazzoni","doi":"10.1145/2966884.2966892","DOIUrl":"https://doi.org/10.1145/2966884.2966892","url":null,"abstract":"Density Functional Theory calculations with plane waves and pseudopotentials represent one of the most important simulation techniques in high performance computing. Together with parallel linear algebra (ZGEMM and matrix diagonalization), the most important bottleneck results from the Fast Fourier Transform (FFT), required, for example, when the local potential is applied to the wavefunction. In these calculations, the existence of a cutoff on the plane waves is reflected on a spherical domain for the FFT. After a 1D FFT is performed on pencils distributed among processors, data is transposed with a MPI_Alltoall and a 2D FFT is executed [2]. Typically, the workload of the FFT is not particularly high, since grid sizes do not exceed (103 102)3. However, the load distribution is crucial and the consequent impact of collective communications becomes a critical factor for achieving a high parallel efficiency. Quantum ESPRESSO [3] is one of the most used codes based on plane-wave DFT in the community of material science. It has been successfully ported and optimized on a large number of HPC infrastructures all over the world. The parallel structure of Quantum ESPRESSO is mainly based on several layers of MPI communicators, plus a finer grain OpenMP parallelization. Recently, the parallelization structure of the FFT was deeply refactored. The combination of two different data distributions, i.e. bands and taskgroups, allow the underlyinghardware to be hierarchically filled and two different layers of communications to be tuned. In particular, with sufficient memory, by tuning the number of taskgroups one can fit all the data required to perform a single 3D FFT reducing the impact of the MPI_Alltoall between the 1D and 2D FFTs. In order to better check the results of the parametrization of the parallel distributions, a miniapp [1] containing only the FFT kernel was extracted from the Quantum ESPRESSO distribution. This miniapp is also important for the future activity of code design of novel architectures. We present and discuss the profiling data obtained from the QE-FFT miniapp and the impact on the communication pattern deriving from the choice of the parallelization parameters.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116449065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The MIG Framework: Enabling Transparent Process Migration in Open MPI MIG框架:在开放MPI中启用透明进程迁移

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966903

F. Reghenzani, G. Pozzi, G. Massari, Simone Libutti, W. Fornaciari

引用次数: 6

Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test SMP节点上的MPI通信性能建模:是时候放弃乒乓测试了吗

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966919

W. Gropp, Luke N. Olson, Philipp Samfass

引用次数: 36

Optimizing PARSEC for Knights Landing 优化骑士登陆的PARSEC

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966895

A. Malhanov, Ariel J. Biller, Michael Chuvelev

引用次数: 0

MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale MPI会话:利用运行时基础设施来增加Exascale应用程序的可伸缩性

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966915

Daniel J. Holmes, K. Mohror, Ryan E. Grant, A. Skjellum, M. Schulz, Wesley Bland, J. Squyres

引用次数: 20

On the Expected and Observed Communication Performance with MPI Derived Datatypes 基于MPI派生数据类型的预期和观察通信性能

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966905

Alexandra Carpen-Amarie, S. Hunold, J. Träff

{"title":"On the Expected and Observed Communication Performance with MPI Derived Datatypes","authors":"Alexandra Carpen-Amarie, S. Hunold, J. Träff","doi":"10.1145/2966884.2966905","DOIUrl":"https://doi.org/10.1145/2966884.2966905","url":null,"abstract":"We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, \"raw\" performance of communicating simple, noncontiguous data layouts. We show that common MPI libraries sometimes violate these datatype performance expectations, and discuss reasons why this happens, but also show cases where MPI libraries perform well. Our findings are in many ways surprising and disappointing. First, the performance of derived datatypes is sometimes worse than the semantically equivalent packing and unpacking using the corresponding MPI functionality. Second, the communication performance equivalence stated in the MPI standard between a single contiguous datatype and the repetition of its constituent datatype does not hold universally. Third, the heuristics that are typically employed by MPI libraries at type-commit time are insufficient to enforce natural performance guidelines, and better type normalization heuristics may have a significant performance impact. We show cases where all the MPI type constructors are necessary to achieve the expected performance for certain data layouts. We describe our benchmarking approach to verify the datatype performance guidelines, and present extensive verification results for different MPI libraries.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129430479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Effective Calculation with Halo communication using Halo Functions 利用Halo函数进行Halo通信的有效计算

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966893

K. Fukazawa, Toshiya Takami, T. Soga, Yoshiyuki Morie, T. Nanri

引用次数: 1

Generalisation of Recursive Doubling for AllReduce AllReduce的递归加倍推广

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966913

M. Ruefenacht, Mark Bull, S. Booth

引用次数: 5

Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling 为优先级驱动的自适应调度构建可伸缩MPI应用程序

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966907

Pierre Lemarinier, K. Hasanov, S. Venugopal, K. Katrinis

引用次数: 15

CAF Events Implementation Using MPI-3 Capabilities 使用MPI-3功能实现CAF事件

Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966916

A. Fanfarillo, Jeff R. Hammond

引用次数: 6