{"title":"MPI优于uDAPL:高性能和可移植性可以跨架构存在吗?","authors":"Lei Chai, R. Noronha, D. Panda","doi":"10.1109/CCGRID.2006.70","DOIUrl":null,"url":null,"abstract":"Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?\",\"authors\":\"Lei Chai, R. Noronha, D. Panda\",\"doi\":\"10.1109/CCGRID.2006.70\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.\",\"PeriodicalId\":419226,\"journal\":{\"name\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2006.70\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2006.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?
Looking at the TOP 500 list of supercomputers we can see that different architectures and networking technologies appear on the scene from time to time. The networking technologies are also changing along with the advances of processor technologies. While the hardware has been constantly changing, parallel applications written in different paradigms have remained largely unchanged. With MPI being the most popular parallel computing standard, it is crucial to have an MPI implementation portable across different networks and architectures. It is also desirable to have such an MPI deliver high performance. In this paper we take on this challenge. We have designed an MPI with both portability and portable high performance using the emerging uDAPL interface. We present the design alternatives and a comprehensive performance evaluation of this new design. The results show that this design can improve the startup time and communication performance by 30% compared with our previous work. It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect. We also present a multistream MPI design which aims to achieve high bandwidth across networks and operating systems. Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.