MPI Derived Datatypes: Performance and Portability Issues

Proceedings of the 25th European MPI Users' Group Meeting Pub Date : 2018-09-23 DOI:10.1145/3236367.3236378

Qingqing Xiong, P. Bangalore, A. Skjellum, M. Herbordt

{"title":"MPI Derived Datatypes: Performance and Portability Issues","authors":"Qingqing Xiong, P. Bangalore, A. Skjellum, M. Herbordt","doi":"10.1145/3236367.3236378","DOIUrl":null,"url":null,"abstract":"This paper addresses performance-portability and overall performance issues when derived datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel MPI. These comparisons are particularly relevant today since most vendor implementations are now based on Open MPI or MPICH rather than on vendor proprietary code as was more prevalent in the past. Our findings are that, within a single MPI implementation, there are significant differences in performance as a function of it reasonable encodings of derived datatypes as supported by the MPI standard. While this finding may not be surprising, it is important to understand how fundamental vs. arbitrary choices made in early implementation impact the use of derived datatypes to date. A more significant finding is that one cannot reliably choose a single derived datatype format and expect uniform performance portability among these four implementations. That is, the best-performing path under one of the MPI code bases is not the same as the best-performing path under another. Users have to be prepared to recode for a different formulation to move efficiently among MPICH, MVAPICH2, Intel MPI, and Open MPI. This lack of uniformity presents a significant gap in MPI's fundamental purpose of offering performance portability. Specific examination of internal implementation details indicates why performance is different among the implementations. Proposed solutions to this problem include i) revamping datatypes; ii) providing a common, underlying datatype standard used by multiple MPI implementations; and iii) exploring new ways to describe derived datatypes that are optimizable by modern networks and faster than MPI implementations' software-based marshaling and unmarshaling.","PeriodicalId":225539,"journal":{"name":"Proceedings of the 25th European MPI Users' Group Meeting","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3236367.3236378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

This paper addresses performance-portability and overall performance issues when derived datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel MPI. These comparisons are particularly relevant today since most vendor implementations are now based on Open MPI or MPICH rather than on vendor proprietary code as was more prevalent in the past. Our findings are that, within a single MPI implementation, there are significant differences in performance as a function of it reasonable encodings of derived datatypes as supported by the MPI standard. While this finding may not be surprising, it is important to understand how fundamental vs. arbitrary choices made in early implementation impact the use of derived datatypes to date. A more significant finding is that one cannot reliably choose a single derived datatype format and expect uniform performance portability among these four implementations. That is, the best-performing path under one of the MPI code bases is not the same as the best-performing path under another. Users have to be prepared to recode for a different formulation to move efficiently among MPICH, MVAPICH2, Intel MPI, and Open MPI. This lack of uniformity presents a significant gap in MPI's fundamental purpose of offering performance portability. Specific examination of internal implementation details indicates why performance is different among the implementations. Proposed solutions to this problem include i) revamping datatypes; ii) providing a common, underlying datatype standard used by multiple MPI implementations; and iii) exploring new ways to describe derived datatypes that are optimizable by modern networks and faster than MPI implementations' software-based marshaling and unmarshaling.

查看原文本刊更多论文

MPI派生数据类型:性能和可移植性问题

本文讨论了在四种MPI实现中使用派生数据类型时的性能可移植性和总体性能问题:Open MPI、MPICH、MVAPICH2和Intel MPI。这些比较在今天尤为重要，因为大多数供应商的实现现在都基于Open MPI或MPICH，而不是像过去那样基于供应商的专有代码。我们的发现是，在单个MPI实现中，由于MPI标准所支持的派生数据类型的合理编码，在性能上存在显著差异。虽然这一发现可能并不令人惊讶，但重要的是要了解在早期实现中做出的基本选择与任意选择如何影响迄今为止派生数据类型的使用。一个更重要的发现是，不能可靠地选择单一的派生数据类型格式，并期望在这四种实现之间实现统一的性能可移植性。也就是说，一个MPI代码库下的最佳性能路径与另一个MPI代码库下的最佳性能路径是不一样的。用户必须准备为不同的配方重新编码，以便在MPICH, MVAPICH2, Intel MPI和Open MPI之间有效地移动。这种一致性的缺乏与MPI提供性能可移植性的基本目的有很大的差距。对内部实现细节的具体检查表明，为什么实现之间的性能不同。针对这个问题提出的解决方案包括i)修改数据类型;ii)为多个MPI实现提供通用的底层数据类型标准;iii)探索描述派生数据类型的新方法，这些方法可以被现代网络优化，并且比MPI实现的基于软件的编组和反编组更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th European MPI Users' Group Meeting

自引率

0.00%

发文量