Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm

G. Chochia, David G. Solt, Joshua Hursey
{"title":"Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm","authors":"G. Chochia, David G. Solt, Joshua Hursey","doi":"10.1145/3555819.3555821","DOIUrl":null,"url":null,"abstract":"This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.","PeriodicalId":423846,"journal":{"name":"Proceedings of the 29th European MPI Users' Group Meeting","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555819.3555821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.
节点聚合方法在MPI全局集合中的应用——矩阵块聚合算法
本文提出了所有对所有和所有对所有(v) MPI集合的算法,这些算法针对每个节点的中小型消息和大型任务计数进行了优化,以支持HPC系统中的多核cpu。通过两个指标分析这些算法的复杂性:消息数量和每个任务交换的数据量。这些算法在第二个指标上具有最优的复杂度,它比为短消息设计的算法要好一个对数因子,在第一个指标上具有对数复杂度。结果表明,这两个指标之间的平衡是实现最佳性能的关键。通过比较Open MPI和Spectrum MPI中对数算法实现的性能,可以大规模地证明新算法的性能优势。给出了全对全(v)算法的两阶段设计。它在单个框架中结合了短消息和大消息的有效实现,这是对数全对全(v)算法中已知的一个问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信