{"title":"PAARD: Proximity-aware all-reduce communication for dragonfly networks","authors":"Junchao Ma, Dezun Dong, Fei Lei, Liquan Xiao","doi":"10.1016/j.jpdc.2025.105201","DOIUrl":null,"url":null,"abstract":"<div><div>The All-Reduce operation is one of the most widely used collective communication operations, and it is widely used in the research and engineering fields of high-performance computing(HPC) and distributed machine learning(DML). Previous optimization work for All-Reduce operation is to design new algorithms only for different message size and different number of processors, and ignores the optimization that can be achieved by considering the topology. Dragonfly is a popular topology for current and future high-speed interconnection networks. The hierarchical characteristics of dragonfly network can be utilized to effectively reduce hardware overhead while ensuring low end-to-end transmission latency. This paper offers a first attempt to design an efficient All-Reduce algorithm on dragonfly networks, referenced as PAARD. Based on the hierarchical characteristics of dragonfly network, PAARD first proposes an end-to-end solution to alleviate congestion that could remarkably boost performance. We carefully design the algorithm of PAARD to ensure desirable performance with acceptable overhead and guarantee the generality when met marginal cases. Then, to illustrate the effectiveness of PAARD, we analyze the performance of PAARD with the state-of-the-art algorithm, Halving-doubling(HD) algorithm and Ring algorithm. The simulation results demonstrate that in our design the execution time can be improved by 3X for HD and 4.19x for Ring on 256 nodes of a 342-node dragonfly with minimal routing.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"209 ","pages":"Article 105201"},"PeriodicalIF":4.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731525001686","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/11/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The All-Reduce operation is one of the most widely used collective communication operations, and it is widely used in the research and engineering fields of high-performance computing(HPC) and distributed machine learning(DML). Previous optimization work for All-Reduce operation is to design new algorithms only for different message size and different number of processors, and ignores the optimization that can be achieved by considering the topology. Dragonfly is a popular topology for current and future high-speed interconnection networks. The hierarchical characteristics of dragonfly network can be utilized to effectively reduce hardware overhead while ensuring low end-to-end transmission latency. This paper offers a first attempt to design an efficient All-Reduce algorithm on dragonfly networks, referenced as PAARD. Based on the hierarchical characteristics of dragonfly network, PAARD first proposes an end-to-end solution to alleviate congestion that could remarkably boost performance. We carefully design the algorithm of PAARD to ensure desirable performance with acceptable overhead and guarantee the generality when met marginal cases. Then, to illustrate the effectiveness of PAARD, we analyze the performance of PAARD with the state-of-the-art algorithm, Halving-doubling(HD) algorithm and Ring algorithm. The simulation results demonstrate that in our design the execution time can be improved by 3X for HD and 4.19x for Ring on 256 nodes of a 342-node dragonfly with minimal routing.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.