MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI:10.1109/IPDPS.2008.4536283

Matthew J. Koop, T. Jones, D. Panda

{"title":"MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand","authors":"Matthew J. Koop, T. Jones, D. Panda","doi":"10.1109/IPDPS.2008.4536283","DOIUrl":null,"url":null,"abstract":"The need for computational cycles continues to exceed availability, driving commodity clusters to increasing scales. With upcoming clusters containing tens-of-thousands of cores, InfiniBand is a popular interconnect on these clusters, due to its low latency (1.5 musec) and high bandwidth (1.5 GB/sec). Since most scientific applications running on these clusters are written using the message passing interface (MPI) as the parallel programming model, the MPI library plays a key role in the performance and scalability of the system. Nearly all MPIs implemented over InfiniBand currently use the reliable connection (RC) transport of InfiniBand to implement message passing. Using this transport exclusively, however, has been shown to potentially reach a memory footprint of over 200 MB/task at 16 K tasks for the MPI library. The Unreliable Datagram (UD) transport, however, offers higher scalability, but at the cost of medium and large message performance. In this paper we present a multi-transport MPI design, MVAPICH-Aptus, that uses both the RC and UD transports of InfiniBand to deliver scalability and performance higher than that of a single-transport MPI design. Evaluation of our hybrid design on 512 cores shows a 12% improvement over an RC-based design and 4% better than a UD-based design for the SMG2000 application benchmark. In addition, for the molecular dynamics application NAMD we show a 10% improvement over an RC-only design. To the best of our knowledge, this is the first such analysis and design of optimized MPI using both UD and RC.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"208 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2008.4536283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

Abstract

The need for computational cycles continues to exceed availability, driving commodity clusters to increasing scales. With upcoming clusters containing tens-of-thousands of cores, InfiniBand is a popular interconnect on these clusters, due to its low latency (1.5 musec) and high bandwidth (1.5 GB/sec). Since most scientific applications running on these clusters are written using the message passing interface (MPI) as the parallel programming model, the MPI library plays a key role in the performance and scalability of the system. Nearly all MPIs implemented over InfiniBand currently use the reliable connection (RC) transport of InfiniBand to implement message passing. Using this transport exclusively, however, has been shown to potentially reach a memory footprint of over 200 MB/task at 16 K tasks for the MPI library. The Unreliable Datagram (UD) transport, however, offers higher scalability, but at the cost of medium and large message performance. In this paper we present a multi-transport MPI design, MVAPICH-Aptus, that uses both the RC and UD transports of InfiniBand to deliver scalability and performance higher than that of a single-transport MPI design. Evaluation of our hybrid design on 512 cores shows a 12% improvement over an RC-based design and 4% better than a UD-based design for the SMG2000 application benchmark. In addition, for the molecular dynamics application NAMD we show a 10% improvement over an RC-only design. To the best of our knowledge, this is the first such analysis and design of optimized MPI using both UD and RC.

查看原文本刊更多论文

MVAPICH-Aptus:基于ib的可扩展高性能多传输MPI

对计算周期的需求继续超过可用性，推动商品集群的规模不断扩大。由于InfiniBand具有低延迟(1.5 mb /秒)和高带宽(1.5 GB/秒)，因此在即将到来的包含数万个核心的集群中，InfiniBand是这些集群上流行的互连方式。由于在这些集群上运行的大多数科学应用程序都是使用消息传递接口(MPI)作为并行编程模型编写的，因此MPI库在系统的性能和可伸缩性方面起着关键作用。目前几乎所有通过InfiniBand实现的mpi都使用InfiniBand的可靠连接(RC)传输来实现消息传递。但是，对于MPI库，如果只使用这种传输，在16 K任务时可能会达到200 MB/任务以上的内存占用。然而，不可靠数据报(UD)传输提供了更高的可伸缩性，但代价是中型和大型消息的性能。在本文中，我们提出了一种多传输MPI设计，MVAPICH-Aptus，它使用InfiniBand的RC和UD传输来提供比单传输MPI设计更高的可扩展性和性能。对我们的混合设计在512核上的评估显示，在SMG2000应用程序基准测试中，比基于rc的设计提高了12%，比基于ud的设计提高了4%。此外，对于分子动力学应用NAMD，我们显示了比仅rc设计的10%的改进。据我们所知，这是第一次使用UD和RC对MPI进行优化分析和设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Symposium on Parallel and Distributed Processing

自引率

0.00%

发文量