Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.43

H. Subramoni, K. Kandalla, Jérôme Vienne, S. Sur, B. Barth, K. Tomko, R. McLay, K. Schulz, D. Panda

{"title":"Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters","authors":"H. Subramoni, K. Kandalla, Jérôme Vienne, S. Sur, B. Barth, K. Tomko, R. McLay, K. Schulz, D. Panda","doi":"10.1109/CLUSTER.2011.43","DOIUrl":null,"url":null,"abstract":"It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid improvements in networking technology and speed are resulting in many commodity clusters becoming heterogeneous, with respect to networking speed. For example, switches and adapters belonging to different generations (SDR - 8 Gbps, DDR - 16 Gbps and QDR - 36 Gbps speeds in InfiniBand) are integrated into a single system. This leads to an additional challenge to make the communication library aware of the performance implications of heterogeneous link speeds. Accordingly, the communication library can perform optimizations taking link speed into account. In this paper, we propose a framework to automatically detect the topology and speed of an InfiniBand network and make it available to users through an easy to use interface. We also make design changes inside the MPI library to dynamically query this topology detection service and to form a topology model of the underlying network. We have redesigned the broadcast algorithm to take into account this network topology information and dynamically adapt the communication pattern to best fit the characteristics of the underlying network. To the best of our knowledge, this is the first such work for InfiniBand clusters. Our experimental results show that, for large homogeneous systems and large message sizes, we get up to 14% improvement in the latency of the broadcast operation using our proposed network topology-aware scheme over the default scheme at the micro-benchmark level. At the application level, the proposed framework delivers up to 8% improvement in total application run-time especially as job size scales up. The proposed network speed-aware algorithms are able to attain micro-benchmark performance on the heterogeneous SDR-DDR InfiniBand cluster to perform on par with runs on the DDR only portion of the cluster for small to medium sized messages. We also demonstrate that the network speed aware algorithms perform 70% to 100% better than the naive algorithms when both are run on the heterogeneous SDR-DDR InfiniBand cluster.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"62 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid improvements in networking technology and speed are resulting in many commodity clusters becoming heterogeneous, with respect to networking speed. For example, switches and adapters belonging to different generations (SDR - 8 Gbps, DDR - 16 Gbps and QDR - 36 Gbps speeds in InfiniBand) are integrated into a single system. This leads to an additional challenge to make the communication library aware of the performance implications of heterogeneous link speeds. Accordingly, the communication library can perform optimizations taking link speed into account. In this paper, we propose a framework to automatically detect the topology and speed of an InfiniBand network and make it available to users through an easy to use interface. We also make design changes inside the MPI library to dynamically query this topology detection service and to form a topology model of the underlying network. We have redesigned the broadcast algorithm to take into account this network topology information and dynamically adapt the communication pattern to best fit the characteristics of the underlying network. To the best of our knowledge, this is the first such work for InfiniBand clusters. Our experimental results show that, for large homogeneous systems and large message sizes, we get up to 14% improvement in the latency of the broadcast operation using our proposed network topology-aware scheme over the default scheme at the micro-benchmark level. At the application level, the proposed framework delivers up to 8% improvement in total application run-time especially as job size scales up. The proposed network speed-aware algorithms are able to attain micro-benchmark performance on the heterogeneous SDR-DDR InfiniBand cluster to perform on par with runs on the DDR only portion of the cluster for small to medium sized messages. We also demonstrate that the network speed aware algorithms perform 70% to 100% better than the naive algorithms when both are run on the heterogeneous SDR-DDR InfiniBand cluster.

查看原文本刊更多论文

InfiniBand集群网络拓扑/速度感知广播算法的设计与评估

网络拓扑结构对科学并行应用性能的影响是一个公认的事实。然而，在支持并行编程模型的通信库中设计易于使用的解决方案所做的工作很少，在并行编程模型中，对最终用户隐藏了使应用程序性能网络拓扑不可知的复杂性。同样，网络技术和速度的快速改进导致许多商品集群在网络速度方面变得异构。例如，属于不同世代的交换机和适配器(SDR - 8gbps, DDR - 16gbps和QDR - 36gbps的InfiniBand速度)被集成到一个系统中。这带来了另一个挑战，即让通信库意识到异构链路速度对性能的影响。因此，通信库可以在考虑链接速度的情况下执行优化。在本文中，我们提出了一个框架来自动检测ib网络的拓扑和速度，并通过一个易于使用的界面提供给用户。我们还在MPI库中进行了设计更改，以动态查询该拓扑检测服务并形成底层网络的拓扑模型。我们重新设计了广播算法，以考虑网络拓扑信息，并动态调整通信模式以最适合底层网络的特征。据我们所知，这是InfiniBand集群的首次此类工作。我们的实验结果表明，对于大型同构系统和大消息大小，在微基准测试级别上，使用我们提出的网络拓扑感知方案，广播操作的延迟比默认方案提高了14%。在应用程序级别，建议的框架在整个应用程序运行时中提供了高达8%的改进，特别是当作业大小扩大时。所提出的网络速度感知算法能够在异构SDR-DDR InfiniBand集群上获得微基准性能，与仅在集群的DDR部分上运行中小型消息的性能相当。我们还证明，在异构SDR-DDR InfiniBand集群上运行时，网络速度感知算法的性能比原始算法高70%到100%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量