Achieving Portability and Performance through OpenACC

2014 First Workshop on Accelerator Programming using Directives Pub Date : 2014-11-16 DOI:10.1109/WACCPD.2014.10

J. Herdman, W. Gaudin, O. Perks, D. Beckingsale, A. Mallinson, S. Jarvis

{"title":"Achieving Portability and Performance through OpenACC","authors":"J. Herdman, W. Gaudin, O. Perks, D. Beckingsale, A. Mallinson, S. Jarvis","doi":"10.1109/WACCPD.2014.10","DOIUrl":null,"url":null,"abstract":"OpenACC is a directive-based programming model designed to allow easy access to emerging advanced architecture systems for existing production codes based on Fortran, C and C++. It also provides an approach to coding contemporary technologies without the need to learn complex vendor-specific languages, or understand the hardware at the deepest level. Portability and performance are the key features of this programming model, which are essential to productivity in real scientific applications. OpenACC support is provided by a number of vendors and is defined by an open standard. However the standard is relatively new, and the implementations are relatively immature. This paper experimentally evaluates the currently available compilers by assessing two approaches to the OpenACC programming model: the \"parallel\" and \"kernels\" constructs. The implementation of both of these construct is compared, for each vendor, showing performance differences of up to 84%. Additionally, we observe performance differences of up to 13% between the best vendor implementations. OpenACC features which appear to cause performance issues in certain compilers are identified and linked to differing default vector length clauses between vendors. These studies are carried out over a range of hardware including GPU, APU, Xeon and Xeon Phi based architectures. Finally, OpenACC performance, and productivity, are compared against the alternative native programming approaches on each targeted platform, including CUDA, OpenCL, OpenMP 4.0 and Intel Offload, in addition to MPI and OpenMP.","PeriodicalId":179664,"journal":{"name":"2014 First Workshop on Accelerator Programming using Directives","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First Workshop on Accelerator Programming using Directives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACCPD.2014.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

OpenACC is a directive-based programming model designed to allow easy access to emerging advanced architecture systems for existing production codes based on Fortran, C and C++. It also provides an approach to coding contemporary technologies without the need to learn complex vendor-specific languages, or understand the hardware at the deepest level. Portability and performance are the key features of this programming model, which are essential to productivity in real scientific applications. OpenACC support is provided by a number of vendors and is defined by an open standard. However the standard is relatively new, and the implementations are relatively immature. This paper experimentally evaluates the currently available compilers by assessing two approaches to the OpenACC programming model: the "parallel" and "kernels" constructs. The implementation of both of these construct is compared, for each vendor, showing performance differences of up to 84%. Additionally, we observe performance differences of up to 13% between the best vendor implementations. OpenACC features which appear to cause performance issues in certain compilers are identified and linked to differing default vector length clauses between vendors. These studies are carried out over a range of hardware including GPU, APU, Xeon and Xeon Phi based architectures. Finally, OpenACC performance, and productivity, are compared against the alternative native programming approaches on each targeted platform, including CUDA, OpenCL, OpenMP 4.0 and Intel Offload, in addition to MPI and OpenMP.

查看原文本刊更多论文

通过OpenACC实现可移植性和性能

OpenACC是一种基于指令的编程模型，旨在方便地访问基于Fortran、C和c++的现有产品代码的新兴高级体系结构系统。它还提供了一种对现代技术进行编码的方法，而不需要学习复杂的特定于供应商的语言，或者在最深层次上理解硬件。可移植性和性能是这种编程模型的关键特征，这对于真正的科学应用程序的生产力至关重要。OpenACC支持由许多供应商提供，并由开放标准定义。然而，该标准相对较新，其实现也相对不成熟。本文通过评估OpenACC编程模型的两种方法:“并行”和“内核”结构，实验性地评估了目前可用的编译器。对每个供应商的这两种结构的实现进行了比较，显示出高达84%的性能差异。此外，我们观察到最佳供应商实现之间的性能差异高达13%。在某些编译器中导致性能问题的OpenACC特性被识别出来，并与供应商之间不同的默认向量长度子句相关联。这些研究是在一系列硬件上进行的，包括GPU, APU, Xeon和Xeon Phi基于架构。最后，将OpenACC的性能和生产力与每个目标平台上的替代本地编程方法进行比较，除了MPI和OpenMP之外，还包括CUDA, OpenCL, OpenMP 4.0和Intel Offload。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 First Workshop on Accelerator Programming using Directives

自引率

0.00%

发文量