CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling

Yannick Emonds, Lorenz Braun, H. Fröning
{"title":"CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling","authors":"Yannick Emonds, Lorenz Braun, H. Fröning","doi":"10.1109/CCGrid57682.2023.00021","DOIUrl":null,"url":null,"abstract":"Today a variety of different GPU types exists, raising questions regarding high-level tasks such as provisioning and scheduling. To predict execution time on different GPU types accurately, we propose a method to obtain execution statistics based on compile-time static code analysis, in which the control flow graph for the code's basic blocks is determined. This graph is represented as an adjacency matrix and used in a system of linear equations to calculate the basic block execution frequencies. Kernel execution itself is not necessary for this analysis. We analyze the proposed method for five different benchmark suites, showing that 76 out of 79 evaluated kernels can be analyzed with an average error of 0.4 %, primarily due to different LLVM versions, with an average prediction time of 203.96 ms. Furthermore, repetitive kernels make memoization effective, and the underlying analysis is largely independent of problem size.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Today a variety of different GPU types exists, raising questions regarding high-level tasks such as provisioning and scheduling. To predict execution time on different GPU types accurately, we propose a method to obtain execution statistics based on compile-time static code analysis, in which the control flow graph for the code's basic blocks is determined. This graph is represented as an adjacency matrix and used in a system of linear equations to calculate the basic block execution frequencies. Kernel execution itself is not necessary for this analysis. We analyze the proposed method for five different benchmark suites, showing that 76 out of 79 evaluated kernels can be analyzed with an average error of 0.4 %, primarily due to different LLVM versions, with an average prediction time of 203.96 ms. Furthermore, repetitive kernels make memoization effective, and the underlying analysis is largely independent of problem size.
CUDAsap:静态确定的执行统计作为基于执行的分析的替代方案
今天,各种不同的GPU类型存在,提出了有关高级任务(如供应和调度)的问题。为了准确预测不同GPU类型上的执行时间,我们提出了一种基于编译时静态代码分析的执行统计数据获取方法,该方法确定了代码基本块的控制流图。此图表示为邻接矩阵,并用于线性方程组中计算基本块执行频率。内核执行本身对于这个分析来说并不是必需的。我们对五种不同的基准套件分析了所提出的方法,结果表明,79个评估内核中的76个可以以0.4%的平均误差进行分析,平均预测时间为203.96 ms,主要是由于不同的LLVM版本。此外,重复的核使记忆变得有效,并且底层分析在很大程度上与问题大小无关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信