OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices

Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2020-05-25 DOI:10.1145/3378678.3391881

Chen Yu, Sara Royuela, E. Quiñones

{"title":"OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices","authors":"Chen Yu, Sara Royuela, E. Quiñones","doi":"10.1145/3378678.3391881","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing is increasingly being used in a diversity of computing systems, ranging from HPC to the real-time embedded domain, to cope with the performance requirements. Due to the variety of accelerators, e.g., FPGAs, GPUs, the use of high-level parallel programming models is desirable to exploit the performance capabilities of them, while maintaining an adequate productivity level. In that regard, OpenMP is a well-known high-level programming model that incorporates powerful task and accelerator models capable of efficiently exploiting structured and unstructured parallelism in heterogeneous computing. This paper presents a novel compiler transformation technique that automatically transforms OpenMP code into CUDA graphs, combining the benefits of programmability of a high-level programming model such as OpenMP, with the performance benefits of a low-level programming model such as CUDA. Evaluations have been performed on two NVIDIA GPUs from the HPC and embedded domains, i.e., the V100 and the Jetson AGX respectively.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378678.3391881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Heterogeneous computing is increasingly being used in a diversity of computing systems, ranging from HPC to the real-time embedded domain, to cope with the performance requirements. Due to the variety of accelerators, e.g., FPGAs, GPUs, the use of high-level parallel programming models is desirable to exploit the performance capabilities of them, while maintaining an adequate productivity level. In that regard, OpenMP is a well-known high-level programming model that incorporates powerful task and accelerator models capable of efficiently exploiting structured and unstructured parallelism in heterogeneous computing. This paper presents a novel compiler transformation technique that automatically transforms OpenMP code into CUDA graphs, combining the benefits of programmability of a high-level programming model such as OpenMP, with the performance benefits of a low-level programming model such as CUDA. Evaluations have been performed on two NVIDIA GPUs from the HPC and embedded domains, i.e., the V100 and the Jetson AGX respectively.

查看原文本刊更多论文

OpenMP到CUDA图形:一个基于编译器的转换，以增强NVIDIA设备的可编程性

异构计算越来越多地应用于各种计算系统，从高性能计算到实时嵌入式领域，以满足性能要求。由于各种各样的加速器，例如，fpga, gpu，使用高级并行编程模型是理想的，以利用它们的性能能力，同时保持足够的生产力水平。在这方面，OpenMP是一个著名的高级编程模型，它结合了强大的任务和加速器模型，能够有效地利用异构计算中的结构化和非结构化并行性。本文提出了一种新的编译器转换技术，该技术将OpenMP代码自动转换为CUDA图形，结合了OpenMP等高级编程模型的可编程性优势和CUDA等低级编程模型的性能优势。分别在HPC和嵌入式领域的两个NVIDIA gpu上进行了评估，即V100和Jetson AGX。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems

自引率

0.00%

发文量