LRUM:不可靠硬件组播的本地可靠性协议

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI:10.1145/3149457.3149467

Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer

{"title":"LRUM:不可靠硬件组播的本地可靠性协议","authors":"Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer","doi":"10.1145/3149457.3149467","DOIUrl":null,"url":null,"abstract":"This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"LRUM: Local Reliability Protocol for Unreliable Hardware Multicast\",\"authors\":\"Hoang-Vu Dang, Brian Smith, R. Graham, G. Shainer\",\"doi\":\"10.1145/3149457.3149467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3149457.3149467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3149467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文描述了两种新的消息传递接口(MPI)广播算法，它们的性能基本上与通信器的大小无关。这些都是基于使用InfiniBand不可靠数据报(UD)硬件多播功能，其延迟非常接近于通信器中根进程和最远进程之间的MPI乒乓点对点延迟。这些算法依赖于一个新的规模无关的本地可靠性协议，以保证目标缓冲区在负载不平衡下的可用性。将性能与HPC-X/Open MPI、MVAPICH和IntelMPI进行了比较。新算法提供了最佳的可用延迟。在128个处理时，新算法在4兆字节时性能好2.3倍，在4千字节时性能好5%，并且与下一个最佳广播实现相比，在8字节广播时提供相当的性能。新算法还具有最低的流延迟和最高的广播吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LRUM: Local Reliability Protocol for Unreliable Hardware Multicast

This paper describes two new Message Passing Interface (MPI) broadcast algorithms who's performance is essentially independent of communicator size. These are based on using the InfiniBand unreliable datagram (UD) hardware multicast capabilities, with a latency which is very close to that of the MPI ping-pong point-to-point latency between the root and the furthest away process in the communicator. These algorithms rely on a new scale-independent local reliability protocol that guarantees destination buffer availability under load imbalance. Performance is compared to that of HPC-X/Open MPI, MVAPICH and IntelMPI. The new algorithms provide the best available latency across the board. At 128 processes the new algorithms are 2.3 times better at four megabytes, 5% better at four kilobytes, and provide comparable performance at eight byte broadcasts when compared to the next best broadcast implementation. The new algorithms also demonstrate the lowest streaming latency and highest broadcast throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量