E. Gallardo, Jérôme Vienne, L. Fialho, P. Teller, J. Browne
{"title":"MPI顾问:MPI库性能调优的最小开销工具","authors":"E. Gallardo, Jérôme Vienne, L. Fialho, P. Teller, J. Browne","doi":"10.1145/2802658.2802667","DOIUrl":null,"url":null,"abstract":"A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning\",\"authors\":\"E. Gallardo, Jérôme Vienne, L. Fialho, P. Teller, J. Browne\",\"doi\":\"10.1145/2802658.2802667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.\",\"PeriodicalId\":365272,\"journal\":{\"name\":\"Proceedings of the 22nd European MPI Users' Group Meeting\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd European MPI Users' Group Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2802658.2802667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2802658.2802667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MPI Advisor: a Minimal Overhead Tool for MPI Library Performance Tuning
A majority of parallel applications executed on HPC clusters use MPI for communication between processes. Most users treat MPI as a black box, executing their programs using the cluster's default settings. While the default settings perform adequately for many cases, it is well known that optimizing the MPI environment can significantly improve application performance. Although the existing optimization tools are effective when used by performance experts, they require deep knowledge of MPI library behavior and the underlying hardware architecture in which the application will be executed. Therefore, an easy-to-use tool that provides recommendations for configuring the MPI environment to optimize application performance is highly desirable. This paper addresses this need by presenting an easy-to-use methodology and tool, named MPI Advisor, that requires just a single execution of the input application to characterize its predominant communication behavior and determine the MPI configuration that may enhance its performance on the target combination of MPI library and hardware architecture. Currently, MPI Advisor provides recommendations that address the four most commonly occurring MPI-related performance bottlenecks, which are related to the choice of: 1) point-to-point protocol (eager vs. rendezvous), 2) collective communication algorithm, 3) MPI tasks-to-cores mapping, and 4) Infiniband transport protocol. The performance gains obtained by implementing the recommended optimizations in the case studies presented in this paper range from a few percent to more than 40%. Specifically, using this tool, we were able to improve the performance of HPCG with MVAPICH2 on four nodes of the Stampede cluster from 6.9 GFLOP/s to 10.1 GFLOP/s. Since the tool provides application-specific recommendations, it also informs the user about correct usage of MPI.