CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid) Pub Date : 2023-05-01 DOI:10.1109/CCGrid57682.2023.00021

Yannick Emonds, Lorenz Braun, H. Fröning

引用次数: 0

Abstract

Today a variety of different GPU types exists, raising questions regarding high-level tasks such as provisioning and scheduling. To predict execution time on different GPU types accurately, we propose a method to obtain execution statistics based on compile-time static code analysis, in which the control flow graph for the code's basic blocks is determined. This graph is represented as an adjacency matrix and used in a system of linear equations to calculate the basic block execution frequencies. Kernel execution itself is not necessary for this analysis. We analyze the proposed method for five different benchmark suites, showing that 76 out of 79 evaluated kernels can be analyzed with an average error of 0.4 %, primarily due to different LLVM versions, with an average prediction time of 203.96 ms. Furthermore, repetitive kernels make memoization effective, and the underlying analysis is largely independent of problem size.

查看原文本刊更多论文

CUDAsap:静态确定的执行统计作为基于执行的分析的替代方案

今天，各种不同的GPU类型存在，提出了有关高级任务(如供应和调度)的问题。为了准确预测不同GPU类型上的执行时间，我们提出了一种基于编译时静态代码分析的执行统计数据获取方法，该方法确定了代码基本块的控制流图。此图表示为邻接矩阵，并用于线性方程组中计算基本块执行频率。内核执行本身对于这个分析来说并不是必需的。我们对五种不同的基准套件分析了所提出的方法，结果表明，79个评估内核中的76个可以以0.4%的平均误差进行分析，平均预测时间为203.96 ms，主要是由于不同的LLVM版本。此外，重复的核使记忆变得有效，并且底层分析在很大程度上与问题大小无关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

自引率

0.00%

发文量