MPI优于uDAPL:高性能和可移植性可以跨架构存在吗?

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06) Pub Date : 2006-05-16 DOI:10.1109/CCGRID.2006.70

Lei Chai, R. Noronha, D. Panda

{"title":"MPI优于uDAPL:高性能和可移植性可以跨架构存在吗?","authors":"Lei Chai, R. Noronha, D. Panda","doi":"10.1109/CCGRID.2006.70","DOIUrl":null,"url":null,"abstract":"Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?\",\"authors\":\"Lei Chai, R. Noronha, D. Panda\",\"doi\":\"10.1109/CCGRID.2006.70\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.\",\"PeriodicalId\":419226,\"journal\":{\"name\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2006.70\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2006.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

看看超级计算机500强榜单，我们可以看到不同的架构和网络技术不时出现在现场。随着处理器技术的进步，网络技术也在发生变化。虽然硬件一直在不断变化，但以不同范例编写的并行应用程序在很大程度上保持不变。由于MPI是最流行的并行计算标准，因此具有跨不同网络和体系结构可移植的MPI实现至关重要。它也希望有这样的MPI提供高性能。在本文中，我们接受了这一挑战。我们使用新兴的uDAPL接口设计了一个具有可移植性和可移植性高性能的MPI。我们提出了设计方案，并对这种新设计进行了综合性能评价。结果表明，与之前的设计相比，该设计可将启动时间和通信性能提高30%。它还提供了与基于底层互连的本机api实现的MPI相同的良好性能。我们还提出了一种多流MPI设计，旨在实现跨网络和操作系统的高带宽。在Solaris上的实验结果表明，多流设计可以将InfiniBand的带宽提高30%，并将应用性能提高11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?

Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)

自引率

0.00%

发文量