Practical, linear-time, fully distributed algorithms for irregular gather and scatter

J. Träff
{"title":"Practical, linear-time, fully distributed algorithms for irregular gather and scatter","authors":"J. Träff","doi":"10.1145/3127024.3127025","DOIUrl":null,"url":null,"abstract":"We present new, simple, fully distributed, practical algorithms with linear time communication cost for irregular gather and scatter operations in which processors contribute or consume possibly different amounts of data. In a homogeneous, linear cost transmission model with start-up latency α and cost per unit β, the new algorithms take time 3⌈log2p⌉α + β Σi≠r mi where p is the number of processors, mi the amount of data for processor i, 0 ≤ i < p, and processor r, 0 ≤ r < p a root processor determined by the algorithm. With a fixed, externally given root processor r, there is an additive time penalty of at most β(Md' − mrd' − Σ0≤j<d' Mj) for some d' < ⌈log2 p⌉, where each Mj is the total amount of data in a tree of 2j different processors with roots rj as constructed by the algorithm. The worst-case time penalty is less than β Σi≠r mi. The algorithms have attractive properties for implementing the operations for MPI (the Message-Passing Interface). Standard algorithms using fixed trees take time either ⌈log2 p⌉(α + β Σi≠r mi) in the worst case, or (p − 1)α + Σi≠r βmi. We have used the new algorithms to give prototype implementations for the MPI_Gatherv and MPI_Scatterv collectives of MPI, and present benchmark results from a small and a medium-large InfiniBand cluster. In order to structure the experimental evaluation we formulate new performance guidelines for irregular collectives that can be used to assess the performance in relation to the corresponding regular collectives. We show that the new algorithms can fulfill these performance expectations within a large margin, and that standard implementations do not.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127024.3127025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

We present new, simple, fully distributed, practical algorithms with linear time communication cost for irregular gather and scatter operations in which processors contribute or consume possibly different amounts of data. In a homogeneous, linear cost transmission model with start-up latency α and cost per unit β, the new algorithms take time 3⌈log2p⌉α + β Σi≠r mi where p is the number of processors, mi the amount of data for processor i, 0 ≤ i < p, and processor r, 0 ≤ r < p a root processor determined by the algorithm. With a fixed, externally given root processor r, there is an additive time penalty of at most β(Md' − mrd' − Σ0≤j
实用的,线性时间,完全分布的算法不规则收集和分散
我们提出了新的,简单的,完全分布的,实用的算法,具有线性时间通信成本,用于不规则的收集和分散操作,其中处理器贡献或消耗可能不同数量的数据。在具有启动延迟α和单位成本β的齐次线性成本传输模型中,新算法花费时间为3≤log2p²α + β Σi≠r mi,其中p为处理器数量,mi为处理器i的数据量,0≤i < p,处理器r, 0≤r < p是由算法确定的根处理器。对于一个固定的,外部给定的根处理器r,对于某些d' <≤log2 p ,存在最多β(Md' - mrd' - Σ0≤j
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信