LRUM:不可靠硬件组播的本地可靠性协议

Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer
{"title":"LRUM:不可靠硬件组播的本地可靠性协议","authors":"Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer","doi":"10.1145/3149457.3149467","DOIUrl":null,"url":null,"abstract":"This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"LRUM: Local Reliability Protocol for Unreliable Hardware Multicast\",\"authors\":\"Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer\",\"doi\":\"10.1145/3149457.3149467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3149457.3149467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3149467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文描述了两种新的消息传递接口(MPI)广播算法,它们的性能基本上与通信器的大小无关。这些都是基于使用InfiniBand不可靠数据报(UD)硬件多播功能,其延迟非常接近于通信器中根进程和最远进程之间的MPI乒乓点对点延迟。这些算法依赖于一个新的规模无关的本地可靠性协议,以保证目标缓冲区在负载不平衡下的可用性。将性能与HPC-X/Open MPI、MVAPICH和IntelMPI进行了比较。新算法提供了最佳的可用延迟。在128个处理时,新算法在4兆字节时性能好2.3倍,在4千字节时性能好5%,并且与下一个最佳广播实现相比,在8字节广播时提供相当的性能。新算法还具有最低的流延迟和最高的广播吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
LRUM: Local Reliability Protocol for Unreliable Hardware Multicast
This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信