{"title":"Efficient all-to-all broadcast in all-port mesh and torus networks","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/HPCA.1999.744382","DOIUrl":null,"url":null,"abstract":"All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallel computing. In this paper, we present a new all-to-all broadcast algorithm in all-port mesh and torus networks. Unlike existing all-to-all broadcast algorithms, the new algorithm takes advantage of overlapping of message switching time and transmission time, and achieves optimal transmission time for all-to-all broadcast. In addition, in most cases, the total communication delay is close to the lower bound of all-to-all broadcast within a small constant range. Finally, the algorithm is conceptually simple, and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the optimum in practice.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.1999.744382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33
Abstract
All-to-all communication is one of the most dense communication patterns and occurs in many important applications in parallel computing. In this paper, we present a new all-to-all broadcast algorithm in all-port mesh and torus networks. Unlike existing all-to-all broadcast algorithms, the new algorithm takes advantage of overlapping of message switching time and transmission time, and achieves optimal transmission time for all-to-all broadcast. In addition, in most cases, the total communication delay is close to the lower bound of all-to-all broadcast within a small constant range. Finally, the algorithm is conceptually simple, and symmetrical for every message and every node so that it can be easily implemented in hardware and achieves the optimum in practice.