{"title":"Enhanced memory management for scalable MPI intra-node communication on many-core processor","authors":"Joong-Yeon Cho, Hyun-Wook Jin, Dukyun Nam","doi":"10.1145/3127024.3127035","DOIUrl":null,"url":null,"abstract":"As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intra-node communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.","PeriodicalId":118516,"journal":{"name":"Proceedings of the 24th European MPI Users' Group Meeting","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127024.3127035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
As the number of cores installed in a single computing node drastically increases, the intra-node communication between parallel processes becomes more important. The parallel programming models, such as Message Passing Interface (MPI), internally perform memory-intensive operations for intra-node communication. Thus, to address the scalability issue on many-core processors, it is critical to exploit emerging memory features provided by the contemporary computer systems. For example, the latest many-core processors are equipped with a high-bandwidth on-package memory Modern 64-bit processors also support a large page size (e.g., 2MB), which can significantly reduce the number of TLB misses. The on-package memory and the huge pages have considerable potential for improving the performance of intra-node communication. However, such features are not thoroughly investigated in terms of intra-node communication in the literature. In this paper, we propose enhanced memory management schemes to efficiently utilize the on-package memory and provide support for huge pages. The proposed schemes can significantly reduce the data copy and memory mapping overheads in MPI intra-node communication. Our experimental results show that our implementation on MVAPICH2 can improve the bandwidth of point-to-point communication up to 373%, and can reduce the latency of collective communication by 79% on an Intel Xeon Phi Knights Landing (KNL) processor.