优化 OpenMPI 通信库的 Allreduce 算法

Guangyao Zhang, Wei Wan, Junhong Li
{"title":"优化 OpenMPI 通信库的 Allreduce 算法","authors":"Guangyao Zhang, Wei Wan, Junhong Li","doi":"10.1117/12.3031959","DOIUrl":null,"url":null,"abstract":"MPI (Message Passing Interface) plays a crucial role in the field of parallel computing. In the Allreduce algorithm of the OpenMPI communication library, there are some issues in handling communication scenarios with a number of processes that is non-power-of-two. The two existing algorithms address this by excluding some processes to achieve a power-of-two process count. However, the consideration factors are too simplistic, resulting in an imbalanced distribution of participating processes on nodes, greatly impacting communication efficiency. To address this problem, the layout of processes on nodes is taken into consideration, and the range of excluded processes is redefined. Both algorithms are subjected to generic load balancing optimizations and adaptations for domestic architectures, resulting in improved load balancing. Experimental results show that, under a communication scale of 16 nodes, the recursive_doubling algorithm achieves performance improvements of up to 30%, while the reduce_scatter_allgather algorithm achieves performance improvements of up to 21%.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Allreduce algorithm optimization of OpenMPI communication library\",\"authors\":\"Guangyao Zhang, Wei Wan, Junhong Li\",\"doi\":\"10.1117/12.3031959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MPI (Message Passing Interface) plays a crucial role in the field of parallel computing. In the Allreduce algorithm of the OpenMPI communication library, there are some issues in handling communication scenarios with a number of processes that is non-power-of-two. The two existing algorithms address this by excluding some processes to achieve a power-of-two process count. However, the consideration factors are too simplistic, resulting in an imbalanced distribution of participating processes on nodes, greatly impacting communication efficiency. To address this problem, the layout of processes on nodes is taken into consideration, and the range of excluded processes is redefined. Both algorithms are subjected to generic load balancing optimizations and adaptations for domestic architectures, resulting in improved load balancing. Experimental results show that, under a communication scale of 16 nodes, the recursive_doubling algorithm achieves performance improvements of up to 30%, while the reduce_scatter_allgather algorithm achieves performance improvements of up to 21%.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3031959\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3031959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

MPI(消息传递接口)在并行计算领域发挥着至关重要的作用。在 OpenMPI 通信库的 Allreduce 算法中,在处理进程数为非两倍幂的通信场景时存在一些问题。现有的两种算法通过排除一些进程来解决这个问题,以实现进程数为 2 的幂次方。但是,考虑的因素过于简单,导致节点上参与进程的分布不平衡,极大地影响了通信效率。为了解决这个问题,我们考虑了节点上的进程布局,并重新定义了排除进程的范围。这两种算法都针对国内架构进行了通用负载平衡优化和调整,从而改善了负载平衡。实验结果表明,在 16 个节点的通信规模下,递归加倍算法的性能最多可提高 30%,而减少分散聚集算法的性能最多可提高 21%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Allreduce algorithm optimization of OpenMPI communication library
MPI (Message Passing Interface) plays a crucial role in the field of parallel computing. In the Allreduce algorithm of the OpenMPI communication library, there are some issues in handling communication scenarios with a number of processes that is non-power-of-two. The two existing algorithms address this by excluding some processes to achieve a power-of-two process count. However, the consideration factors are too simplistic, resulting in an imbalanced distribution of participating processes on nodes, greatly impacting communication efficiency. To address this problem, the layout of processes on nodes is taken into consideration, and the range of excluded processes is redefined. Both algorithms are subjected to generic load balancing optimizations and adaptations for domestic architectures, resulting in improved load balancing. Experimental results show that, under a communication scale of 16 nodes, the recursive_doubling algorithm achieves performance improvements of up to 30%, while the reduce_scatter_allgather algorithm achieves performance improvements of up to 21%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信