{"title":"Implementation and performance evaluation of MPI persistent collectives in MPC: a case study","authors":"Stéphane Bouhrour, Julien Jaeger","doi":"10.1145/3416315.3416321","DOIUrl":null,"url":null,"abstract":"Persistent collective communications have recently been voted in the MPI standard, opening the door to many optimizations to reduce collectives cost, in particular for recurring operations. Indeed persistent semantics contains an initialization phase called only once for a specific collective. It can be used to collect building costs necessary to the collective, to avoid paying them each time the operation is performed. We propose an overview of the implementation of the persistent collectives in the MPC MPI runtime. We first present a naïve implementation for MPI runtimes already providing nonblocking collectives. Then, we improve this first implementation with two levels of caching optimizations. We present the performance results of the naïve and optimized versions and discuss their impact on different collective algorithms. We observe performance improvement compared to the naïve version on a repetitive benchmark, up to a 3x speedup for the reduce collective.","PeriodicalId":176723,"journal":{"name":"Proceedings of the 27th European MPI Users' Group Meeting","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3416315.3416321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Persistent collective communications have recently been voted in the MPI standard, opening the door to many optimizations to reduce collectives cost, in particular for recurring operations. Indeed persistent semantics contains an initialization phase called only once for a specific collective. It can be used to collect building costs necessary to the collective, to avoid paying them each time the operation is performed. We propose an overview of the implementation of the persistent collectives in the MPC MPI runtime. We first present a naïve implementation for MPI runtimes already providing nonblocking collectives. Then, we improve this first implementation with two levels of caching optimizations. We present the performance results of the naïve and optimized versions and discuss their impact on different collective algorithms. We observe performance improvement compared to the naïve version on a repetitive benchmark, up to a 3x speedup for the reduce collective.