Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI:10.1109/ICPP.2010.54

H. Subramoni, P. Lai, S. Sur, D. Panda

{"title":"Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters","authors":"H. Subramoni, P. Lai, S. Sur, D. Panda","doi":"10.1109/ICPP.2010.54","DOIUrl":null,"url":null,"abstract":"Network congestion is an important factor affecting the performance of large scale jobs in supercomputing clusters, especially with the wide deployment of multi-core processors. The blocking nature of current day collectives makes such congestion a critical factor in their performance. On the other hand, modern interconnects like InfiniBand provide us with many novel features such as Virtual Lanes aimed at delivering better performance to end applications. Theoretical research in the field of network congestion indicate Head of Line (HoL) blocking as a common causes for congestion and the use of multiple virtual lanes as one of the ways to alleviate it. In this context, we make use of the multiple virtual lanes provided by the InfiniBand standard as a means to alleviate network congestion and thereby improve the performance of various high performance computing applications on modern multi-core clusters. We integrate our scheme into the MVAPICH2 MPI library. To the best of our knowledge, this is the first such implementation that takes advantage of the use of multiple virtual lanes at the MPI level. We perform various experiments at native InfiniBand, microbenchmark as well as at the application levels. The results of our experimental evaluation show that the use of multiple virtual lanes can improve the predictability of message arrival by up to 10 times in the presence of network congestion. Our microbenchmark level evaluation with multiple communication streams show that the use of multiple virtual lanes can improve the bandwidth / latency / message rate of medium sized messages by up to 13%. Through the use of multiple virtual lanes, we are also able to improve the performance of the Alltoall collective operation for medium message sizes by up to 20%. Performance improvement of up to 12% is also observed for Alltoall collective operation through segregation of traffic into multiple virtual lanes when multiple jobs compete for the same network resource. We also see that our scheme can improve the performance of collective operations used inside the CPMD application by 11% and the overall performance of the CPMD application itself by up to 6%.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"2280 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 39th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2010.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Network congestion is an important factor affecting the performance of large scale jobs in supercomputing clusters, especially with the wide deployment of multi-core processors. The blocking nature of current day collectives makes such congestion a critical factor in their performance. On the other hand, modern interconnects like InfiniBand provide us with many novel features such as Virtual Lanes aimed at delivering better performance to end applications. Theoretical research in the field of network congestion indicate Head of Line (HoL) blocking as a common causes for congestion and the use of multiple virtual lanes as one of the ways to alleviate it. In this context, we make use of the multiple virtual lanes provided by the InfiniBand standard as a means to alleviate network congestion and thereby improve the performance of various high performance computing applications on modern multi-core clusters. We integrate our scheme into the MVAPICH2 MPI library. To the best of our knowledge, this is the first such implementation that takes advantage of the use of multiple virtual lanes at the MPI level. We perform various experiments at native InfiniBand, microbenchmark as well as at the application levels. The results of our experimental evaluation show that the use of multiple virtual lanes can improve the predictability of message arrival by up to 10 times in the presence of network congestion. Our microbenchmark level evaluation with multiple communication streams show that the use of multiple virtual lanes can improve the bandwidth / latency / message rate of medium sized messages by up to 13%. Through the use of multiple virtual lanes, we are also able to improve the performance of the Alltoall collective operation for medium message sizes by up to 20%. Performance improvement of up to 12% is also observed for Alltoall collective operation through segregation of traffic into multiple virtual lanes when multiple jobs compete for the same network resource. We also see that our scheme can improve the performance of collective operations used inside the CPMD application by 11% and the overall performance of the CPMD application itself by up to 6%.

查看原文本刊更多论文

在现代多核InfiniBand集群中使用多虚拟通道提高应用性能和可预测性

网络拥塞是影响超级计算集群中大规模作业性能的一个重要因素，特别是在多核处理器广泛部署的情况下。当今集体的封闭性使这种拥堵成为影响其表现的关键因素。另一方面，像InfiniBand这样的现代互连为我们提供了许多新颖的功能，例如旨在为终端应用程序提供更好性能的虚拟通道。网络拥塞领域的理论研究表明，Head of Line (HoL)阻塞是造成拥塞的常见原因，而使用多个虚拟通道是缓解拥塞的方法之一。在这种情况下，我们利用InfiniBand标准提供的多个虚拟通道作为缓解网络拥塞的一种手段，从而提高现代多核集群上各种高性能计算应用的性能。我们将我们的方案集成到MVAPICH2 MPI库中。据我们所知，这是第一个在MPI级别利用多个虚拟通道的实现。我们在本地InfiniBand、微基准测试以及应用程序级别上进行了各种实验。我们的实验评估结果表明，在网络拥塞的情况下，使用多个虚拟通道可以将消息到达的可预测性提高10倍。我们对多个通信流进行的微基准级评估表明，使用多个虚拟通道可以将中等大小消息的带宽/延迟/消息速率提高13%。通过使用多个虚拟通道，我们还能够将中等消息大小的Alltoall集合操作的性能提高至多20%。当多个作业竞争相同的网络资源时，通过将流量隔离到多个虚拟通道中，Alltoall集体操作的性能也提高了12%。我们还看到，我们的方案可以将CPMD应用程序内部使用的集合操作的性能提高11%，并将CPMD应用程序本身的总体性能提高6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 39th International Conference on Parallel Processing

自引率

0.00%

发文量