Proceedings of the 23rd European MPI Users' Group Meeting最新文献

筛选
英文 中文
FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO 平面波DFT码中的FFT数据分布。一个来自Quantum ESPRESSO的案例研究
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966892
F. Affinito, C. Cavazzoni
{"title":"FFT data distribution in plane-waves DFT codes. A case study from Quantum ESPRESSO","authors":"F. Affinito, C. Cavazzoni","doi":"10.1145/2966884.2966892","DOIUrl":"https://doi.org/10.1145/2966884.2966892","url":null,"abstract":"Density Functional Theory calculations with plane waves and pseudopotentials represent one of the most important simulation techniques in high performance computing. Together with parallel linear algebra (ZGEMM and matrix diagonalization), the most important bottleneck results from the Fast Fourier Transform (FFT), required, for example, when the local potential is applied to the wavefunction. In these calculations, the existence of a cutoff on the plane waves is reflected on a spherical domain for the FFT. After a 1D FFT is performed on pencils distributed among processors, data is transposed with a MPI_Alltoall and a 2D FFT is executed [2]. Typically, the workload of the FFT is not particularly high, since grid sizes do not exceed (103 102)3. However, the load distribution is crucial and the consequent impact of collective communications becomes a critical factor for achieving a high parallel efficiency. Quantum ESPRESSO [3] is one of the most used codes based on plane-wave DFT in the community of material science. It has been successfully ported and optimized on a large number of HPC infrastructures all over the world. The parallel structure of Quantum ESPRESSO is mainly based on several layers of MPI communicators, plus a finer grain OpenMP parallelization. Recently, the parallelization structure of the FFT was deeply refactored. The combination of two different data distributions, i.e. bands and taskgroups, allow the underlyinghardware to be hierarchically filled and two different layers of communications to be tuned. In particular, with sufficient memory, by tuning the number of taskgroups one can fit all the data required to perform a single 3D FFT reducing the impact of the MPI_Alltoall between the 1D and 2D FFTs. In order to better check the results of the parametrization of the parallel distributions, a miniapp [1] containing only the FFT kernel was extracted from the Quantum ESPRESSO distribution. This miniapp is also important for the future activity of code design of novel architectures. We present and discuss the profiling data obtained from the QE-FFT miniapp and the impact on the communication pattern deriving from the choice of the parallelization parameters.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116449065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The MIG Framework: Enabling Transparent Process Migration in Open MPI MIG框架:在开放MPI中启用透明进程迁移
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966903
F. Reghenzani, G. Pozzi, G. Massari, Simone Libutti, W. Fornaciari
{"title":"The MIG Framework: Enabling Transparent Process Migration in Open MPI","authors":"F. Reghenzani, G. Pozzi, G. Massari, Simone Libutti, W. Fornaciari","doi":"10.1145/2966884.2966903","DOIUrl":"https://doi.org/10.1145/2966884.2966903","url":null,"abstract":"This paper introduces the mig framework: an Open MPI extension to transparently support the migration of application processes, over different nodes of a distributed High-Performance Computing (HPC) system. The framework provides mechanism on top of which suitable resource managers can implement policies to react to hardware faults, address performance variability, improve resource utilization, perform a fine-grained load balancing and power thermal management. Compared to other state-of-the-art approaches, the mig framework does not require changes in the application code. Moreover, it is highly maintainable, since it is mainly a self-contained solution that has required a very few changes in other already existing Open MPI frameworks. Experimental results have shown that the proposed extension does not introduce significant overhead in the application execution, while the penalty due to performing a migration can be properly taken into account by a resource manager.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129607435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test SMP节点上的MPI通信性能建模:是时候放弃乒乓测试了吗
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966919
W. Gropp, Luke N. Olson, Philipp Samfass
{"title":"Modeling MPI Communication Performance on SMP Nodes: Is it Time to Retire the Ping Pong Test","authors":"W. Gropp, Luke N. Olson, Philipp Samfass","doi":"10.1145/2966884.2966919","DOIUrl":"https://doi.org/10.1145/2966884.2966919","url":null,"abstract":"The \"postal\" model of communication [3, 8] T = α + βn, for sending n bytes of data between two processes with latency α and bandwidth 1/β, is perhaps the most commonly used communication performance model in parallel computing. This performance model is often used in developing and evaluating parallel algorithms in high-performance computing, and was an effective model when it was first proposed. Consequently, numerous tests of \"ping pong\" communication have been developed in order to measure these parameters in the model. However, with the advent of multicore nodes connected to a single (or a few) network interfaces, the model has become a poor match to modern hardware. In this paper, we show a simple three-parameter model that better captures the behavior of current parallel computing systems, and demonstrate its accuracy on several systems. In support of this model, which we call the max-rate model, we have developed an open source benchmark1 that can be used to determine the model parameters.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116360751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Optimizing PARSEC for Knights Landing 优化骑士登陆的PARSEC
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966895
A. Malhanov, Ariel J. Biller, Michael Chuvelev
{"title":"Optimizing PARSEC for Knights Landing","authors":"A. Malhanov, Ariel J. Biller, Michael Chuvelev","doi":"10.1145/2966884.2966895","DOIUrl":"https://doi.org/10.1145/2966884.2966895","url":null,"abstract":"PARSEC is a massively parallel Density-Functional-Theory (DFT) code. Within the modernization effort towards the new Intel Knights Landing platform, we adapted the main computational kernel, represented as high-order finite-difference stencils, to use hybrid MPI and OpenMP* runtimes. We also employed MPI-3 non-blocking neighborhood collectives for the halo exchange. We present performance data on the Knights Landing platform including MPI traces portraying our exploration of communication-computation overlap in a hybrid MPI and OpenMP application, and the fine-tuning of the interplay between the load-balancing - static at the MPI level and dynamic in OpenMP.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131249462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale MPI会话:利用运行时基础设施来增加Exascale应用程序的可伸缩性
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966915
Daniel J. Holmes, K. Mohror, Ryan E. Grant, A. Skjellum, M. Schulz, Wesley Bland, J. Squyres
{"title":"MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale","authors":"Daniel J. Holmes, K. Mohror, Ryan E. Grant, A. Skjellum, M. Schulz, Wesley Bland, J. Squyres","doi":"10.1145/2966884.2966915","DOIUrl":"https://doi.org/10.1145/2966884.2966915","url":null,"abstract":"MPI includes all processes in MPI_COMM_WORLD; this is untenable for reasons of scale, resiliency, and overhead. This paper offers a new approach, extending MPI with a new concept called Sessions, which makes two key contributions: a tighter integration with the underlying runtime system; and a scalable route to communication groups. This is a fundamental change in how we organise and address MPI processes that removes well-known scalability barriers by no longer requiring the global communicator MPI_COMM_WORLD.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114857582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
On the Expected and Observed Communication Performance with MPI Derived Datatypes 基于MPI派生数据类型的预期和观察通信性能
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966905
Alexandra Carpen-Amarie, S. Hunold, J. Träff
{"title":"On the Expected and Observed Communication Performance with MPI Derived Datatypes","authors":"Alexandra Carpen-Amarie, S. Hunold, J. Träff","doi":"10.1145/2966884.2966905","DOIUrl":"https://doi.org/10.1145/2966884.2966905","url":null,"abstract":"We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, \"raw\" performance of communicating simple, noncontiguous data layouts. We show that common MPI libraries sometimes violate these datatype performance expectations, and discuss reasons why this happens, but also show cases where MPI libraries perform well. Our findings are in many ways surprising and disappointing. First, the performance of derived datatypes is sometimes worse than the semantically equivalent packing and unpacking using the corresponding MPI functionality. Second, the communication performance equivalence stated in the MPI standard between a single contiguous datatype and the repetition of its constituent datatype does not hold universally. Third, the heuristics that are typically employed by MPI libraries at type-commit time are insufficient to enforce natural performance guidelines, and better type normalization heuristics may have a significant performance impact. We show cases where all the MPI type constructors are necessary to achieve the expected performance for certain data layouts. We describe our benchmarking approach to verify the datatype performance guidelines, and present extensive verification results for different MPI libraries.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129430479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Effective Calculation with Halo communication using Halo Functions 利用Halo函数进行Halo通信的有效计算
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966893
K. Fukazawa, Toshiya Takami, T. Soga, Yoshiyuki Morie, T. Nanri
{"title":"Effective Calculation with Halo communication using Halo Functions","authors":"K. Fukazawa, Toshiya Takami, T. Soga, Yoshiyuki Morie, T. Nanri","doi":"10.1145/2966884.2966893","DOIUrl":"https://doi.org/10.1145/2966884.2966893","url":null,"abstract":"The issue of halo communication is the decrease of parallel scalability. To overcome the issues, we have introduced \"Halo thread\" to our simulation code. However, we have not solved the issue basically in the strong scaling. In this study, we have developed the Halo functions which perform the halo communication effectively. Then we can perform the calculation and communication in a pipeline and obtained good performance.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122858886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generalisation of Recursive Doubling for AllReduce AllReduce的递归加倍推广
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966913
M. Ruefenacht, Mark Bull, S. Booth
{"title":"Generalisation of Recursive Doubling for AllReduce","authors":"M. Ruefenacht, Mark Bull, S. Booth","doi":"10.1145/2966884.2966913","DOIUrl":"https://doi.org/10.1145/2966884.2966913","url":null,"abstract":"The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2 N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114586740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling 为优先级驱动的自适应调度构建可伸缩MPI应用程序
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966907
Pierre Lemarinier, K. Hasanov, S. Venugopal, K. Katrinis
{"title":"Architecting Malleable MPI Applications for Priority-driven Adaptive Scheduling","authors":"Pierre Lemarinier, K. Hasanov, S. Venugopal, K. Katrinis","doi":"10.1145/2966884.2966907","DOIUrl":"https://doi.org/10.1145/2966884.2966907","url":null,"abstract":"Future supercomputers will need to support both traditional HPC applications and Big Data/High Performance Analysis applications seamlessly in a common environment. This motivates traditional job scheduling systems to support malleable jobs along with allocations that can dynamically change in size, in order to adapt the amount of resources to the actual current need of the different applications. It also calls for future innovative HPC applications to adapt to this environment, and provide some level of malleability for releasing underutilized resources to other tasks. In this paper, we present and compare two different methodologies to support such malleable MPI applications: 1)using checkpoint/restart and the SCR library, and 2) using dynamic data redistribution and the ULFM API and runtime. We examine their effects on application execution times as well as their impact on resource management.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125317956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
CAF Events Implementation Using MPI-3 Capabilities 使用MPI-3功能实现CAF事件
Proceedings of the 23rd European MPI Users' Group Meeting Pub Date : 2016-09-25 DOI: 10.1145/2966884.2966916
A. Fanfarillo, Jeff R. Hammond
{"title":"CAF Events Implementation Using MPI-3 Capabilities","authors":"A. Fanfarillo, Jeff R. Hammond","doi":"10.1145/2966884.2966916","DOIUrl":"https://doi.org/10.1145/2966884.2966916","url":null,"abstract":"MPI-3.1 is currently the most recent version of the MPI standard. It adds important extensions to MPI-2, including a simplified semantic for the one-sided communication routines and a new tool interface, capable of exposing performance data of the MPI implementation to users and libraries. These and other new features make MPI-3 a good candidate for being the transport layer of PGAS languages like Coarray Fortran. OpenCoarrays, the free coarray implementation used by the GNU Fortran compiler, implements almost all Coarray Fortran 2008 and several Coarray Fortran 2015 features on top of MPI-3. Among the Fortran 2015 features, one of the most relevant for performance improvement is events; such a feature represents a fine grain synchronization mechanism based on a limited implementation of the well known semaphore primitives. In this paper, we analyze two possible implementations of events using MPI-3 features and show how to dynamically select the best implementation, according to the capabilities provided by the MPI implementation. We also show how events can improve the overall performance by reducing idle times in parallel applications.","PeriodicalId":264069,"journal":{"name":"Proceedings of the 23rd European MPI Users' Group Meeting","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131558755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信