Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs

2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools) Pub Date : 2021-11-01 DOI:10.1109/ProTools54808.2021.00009

Aaron Cherian, K. Zhou, Dejan Grubisic, Xiaozhu Meng, J. Mellor-Crummey

{"title":"Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs","authors":"Aaron Cherian, K. Zhou, Dejan Grubisic, Xiaozhu Meng, J. Mellor-Crummey","doi":"10.1109/ProTools54808.2021.00009","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy’s forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs.In this paper, we describe extensions to Rice University’s HPC-Toolkit performance tools that support measurement and analysis of Intel’s DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPC-Toolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions—a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.","PeriodicalId":369391,"journal":{"name":"2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ProTools54808.2021.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Graphics Processing Units (GPUs) have become a key technology for accelerating node performance in supercomputers, including the US Department of Energy’s forthcoming exascale systems. Since the execution model for GPUs differs from that for conventional processors, applications need to be rewritten to exploit GPU parallelism. Performance tools are needed for such GPU-accelerated systems to help developers assess how well applications offload computation onto GPUs.In this paper, we describe extensions to Rice University’s HPC-Toolkit performance tools that support measurement and analysis of Intel’s DPC++ programming model for GPU-accelerated systems atop an implementation of the industry-standard OpenCL framework for heterogeneous parallelism on Intel GPUs. HPCToolkit supports three techniques for performance analysis of programs atop OpenCL on Intel GPUs. First, HPC-Toolkit supports profiling and tracing of OpenCL kernels. Second, HPCToolkit supports CPU-GPU blame shifting for OpenCL kernel executions—a profiling technique that can identify code that executes on one or more CPUs while GPUs are idle. Third, HPCToolkit supports fine-grained measurement, analysis, and attribution of performance metrics to OpenCL GPU kernels, including instruction counts, execution latency, and SIMD waste. The paper describes these capabilities and then illustrates their application in case studies with two applications that offload computations onto Intel GPUs.

查看原文本刊更多论文

gpu加速OpenCL运算在Intel gpu上的测量与分析

图形处理单元(gpu)已经成为加速超级计算机节点性能的关键技术，包括美国能源部即将推出的百亿亿级系统。由于GPU的执行模型不同于传统处理器，因此需要重写应用程序以利用GPU的并行性。这种gpu加速系统需要性能工具来帮助开发人员评估应用程序如何将计算转移到gpu上。在本文中，我们描述了Rice University的HPC-Toolkit性能工具的扩展，该工具支持测量和分析英特尔用于gpu加速系统的dpc++编程模型，该模型基于英特尔gpu上异构并行的行业标准OpenCL框架的实现。HPCToolkit支持三种技术，用于在Intel gpu上的OpenCL之上对程序进行性能分析。首先，HPC-Toolkit支持OpenCL内核的分析和跟踪。其次，HPCToolkit支持OpenCL内核执行的CPU-GPU责任转移——这是一种分析技术，可以识别在gpu空闲时在一个或多个cpu上执行的代码。第三，HPCToolkit支持对OpenCL GPU内核的性能指标进行细粒度测量、分析和归属，包括指令计数、执行延迟和SIMD浪费。本文描述了这些功能，然后通过两个将计算卸载到英特尔gpu上的应用程序的案例研究说明了它们的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)

自引率

0.00%

发文量