Tuning system-dependent applications with alternative MPI calls: a case study

T. Le
{"title":"Tuning system-dependent applications with alternative MPI calls: a case study","authors":"T. Le","doi":"10.1109/SERA.2005.67","DOIUrl":null,"url":null,"abstract":"This paper shows the effectiveness of using optimized MPI calls for MPI based applications on different architectures. Using optimized MPI calls can result in reasonable performance gain for most of MPI based applications running on most of high-performance distributed systems. Since relative performance of different MPI function calls and system architectures can be uncorrelated, tuning system-dependent MPI applications by exploring the alternatives of using different MPI calls is the simplest but most effective optimization method. The paper first shows that for a particular system, there are noticeable performance differences between using various MPI calls that result in the same communication pattern. These performance differences are in fact not similar across different systems. The paper then shows that good performance optimization for an MPI application on different systems can be obtained by using different MPI calls for different systems. The communication patterns that were experimented in this paper include the point-to-point and collective communications. The MPI based application used for this study is the general-purpose transient dynamic finite element application and the benchmark problems are the public domain 3D car crash problems. The experiment results show that for the same communication purpose, using alternative MPI calls can result in quite different communication performance on the Fujitsu HPC2500 system and the 8-node AMD Athlon cluster, but very much the same performance on the other systems such as the Intel Itanium2 and the AMD Opteron clusters.","PeriodicalId":424175,"journal":{"name":"Third ACIS Int'l Conference on Software Engineering Research, Management and Applications (SERA'05)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third ACIS Int'l Conference on Software Engineering Research, Management and Applications (SERA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2005.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper shows the effectiveness of using optimized MPI calls for MPI based applications on different architectures. Using optimized MPI calls can result in reasonable performance gain for most of MPI based applications running on most of high-performance distributed systems. Since relative performance of different MPI function calls and system architectures can be uncorrelated, tuning system-dependent MPI applications by exploring the alternatives of using different MPI calls is the simplest but most effective optimization method. The paper first shows that for a particular system, there are noticeable performance differences between using various MPI calls that result in the same communication pattern. These performance differences are in fact not similar across different systems. The paper then shows that good performance optimization for an MPI application on different systems can be obtained by using different MPI calls for different systems. The communication patterns that were experimented in this paper include the point-to-point and collective communications. The MPI based application used for this study is the general-purpose transient dynamic finite element application and the benchmark problems are the public domain 3D car crash problems. The experiment results show that for the same communication purpose, using alternative MPI calls can result in quite different communication performance on the Fujitsu HPC2500 system and the 8-node AMD Athlon cluster, but very much the same performance on the other systems such as the Intel Itanium2 and the AMD Opteron clusters.
使用备选MPI调用调优依赖系统的应用程序:一个案例研究
本文展示了在不同体系结构的基于MPI的应用程序中使用优化的MPI调用的有效性。对于运行在大多数高性能分布式系统上的大多数基于MPI的应用程序来说,使用优化的MPI调用可以获得合理的性能增益。由于不同MPI函数调用和系统架构的相对性能可能不相关,因此通过探索使用不同MPI调用的替代方案来调优依赖于系统的MPI应用程序是最简单但最有效的优化方法。本文首先表明,对于一个特定的系统,使用导致相同通信模式的各种MPI调用之间存在明显的性能差异。这些性能差异实际上在不同的系统中并不相似。通过对不同系统使用不同的MPI调用,可以获得不同系统上MPI应用程序的良好性能优化。本文实验的通信模式包括点对点通信和集体通信。本研究使用的基于MPI的应用程序是通用的瞬态动态有限元应用程序,基准问题是公共领域的三维车祸问题。实验结果表明,对于相同的通信目的,使用替代MPI调用可以在Fujitsu HPC2500系统和8节点AMD Athlon集群上产生完全不同的通信性能,但在其他系统(如Intel Itanium2和AMD Opteron集群)上产生非常相同的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信