[工程论文]RECKA和RPromF:两个框架- c插件优化CUDA, OpenACC和OpenMP程序中的寄存器使用

R. Diarra, A. Mérigot, B. Vincke
{"title":"[工程论文]RECKA和RPromF:两个框架- c插件优化CUDA, OpenACC和OpenMP程序中的寄存器使用","authors":"R. Diarra, A. Mérigot, B. Vincke","doi":"10.1109/SCAM.2018.00029","DOIUrl":null,"url":null,"abstract":"Pointer aliasing still hinders compiler optimizations. The ISO C standard 99 has added the restrict keyword that allows programmer to specify non-aliasing as an aid to the compiler's optimizer. The task of annotating pointers with the restrict keyword is still left to the programmer and this task is, in general, tedious and prone to errors. Scalar replacement is an optimization widely used by compilers. In this paper, we present two new Frama-C plug-ins, RECKA for automatic annotation of CUDA kernels arguments with the restrict keyword, and RPromF for scalar replacement in OpenACC and OpenMP 4.0/4.5 codes for GPU. More specifically, RECKA works as follows: (i) an alias analysis is performed on CUDA kernels and their callers; (ii) if not found any alias then CUDA kernels are cloned, the clones are renamed and their arguments are annotated with the restrict qualifier; and (iii) instructions are added to kernels call sites to perform at runtime a less-than check analysis on kernel actuals parameters and determine if the clone must be called or the original one. RPromF includes five main steps: (i) OpenACC/OpenMP offloading regions are identified; (ii) functions containing these offloading codes and their callers are analyzed to check that there is no alias; (iii) if there is no alias then the offloading codes are cloned; (iv) clone's instructions are analyzed to retrieve data reuse information and perform scalar replacement; and instructions are added to be able to use the optimized clone whenever possible. We have evaluated the two plug-ins on PolyBench benchmark suite. The results show that both scalar replacement and the usage of restrict keyword are effective for improving the overall performance of OpenACC, OpenMP 4.0/4.5 and CUDA codes.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Engineering Paper] RECKA and RPromF: Two Frama-C Plug-ins for Optimizing Registers Usage in CUDA, OpenACC and OpenMP Programs\",\"authors\":\"R. Diarra, A. Mérigot, B. Vincke\",\"doi\":\"10.1109/SCAM.2018.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pointer aliasing still hinders compiler optimizations. The ISO C standard 99 has added the restrict keyword that allows programmer to specify non-aliasing as an aid to the compiler's optimizer. The task of annotating pointers with the restrict keyword is still left to the programmer and this task is, in general, tedious and prone to errors. Scalar replacement is an optimization widely used by compilers. In this paper, we present two new Frama-C plug-ins, RECKA for automatic annotation of CUDA kernels arguments with the restrict keyword, and RPromF for scalar replacement in OpenACC and OpenMP 4.0/4.5 codes for GPU. More specifically, RECKA works as follows: (i) an alias analysis is performed on CUDA kernels and their callers; (ii) if not found any alias then CUDA kernels are cloned, the clones are renamed and their arguments are annotated with the restrict qualifier; and (iii) instructions are added to kernels call sites to perform at runtime a less-than check analysis on kernel actuals parameters and determine if the clone must be called or the original one. RPromF includes five main steps: (i) OpenACC/OpenMP offloading regions are identified; (ii) functions containing these offloading codes and their callers are analyzed to check that there is no alias; (iii) if there is no alias then the offloading codes are cloned; (iv) clone's instructions are analyzed to retrieve data reuse information and perform scalar replacement; and instructions are added to be able to use the optimized clone whenever possible. We have evaluated the two plug-ins on PolyBench benchmark suite. The results show that both scalar replacement and the usage of restrict keyword are effective for improving the overall performance of OpenACC, OpenMP 4.0/4.5 and CUDA codes.\",\"PeriodicalId\":127335,\"journal\":{\"name\":\"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2018.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2018.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

指针混叠仍然阻碍编译器优化。ISO C标准99增加了restrict关键字,允许程序员指定非混叠,作为编译器优化器的辅助。用restrict关键字注释指针的任务仍然留给程序员,而这项任务通常是乏味的,而且容易出错。标量替换是编译器广泛使用的一种优化方法。在本文中,我们提出了两个新的Frama-C插件,RECKA用于使用restrict关键字自动标注CUDA内核参数,RPromF用于GPU的OpenACC和OpenMP 4.0/4.5代码中的标量替换。更具体地说,RECKA的工作原理如下:(i)在CUDA内核及其调用者上执行别名分析;(ii)如果没有找到任何别名,那么CUDA内核被克隆,克隆被重命名,它们的参数用限制限定符注释;(iii)将指令添加到内核调用站点,以便在运行时对内核实际参数执行小于检查的分析,并确定是否必须调用克隆或原始版本。RPromF包括五个主要步骤:(i)确定OpenACC/OpenMP卸载区域;(ii)分析包含这些卸载代码的函数及其调用者,以检查是否没有别名;(iii)如果没有别名,则会克隆卸载代码;(iv)分析克隆指令,检索数据重用信息,进行标量替换;并且添加了指令,以便能够在任何可能的情况下使用优化的克隆。我们已经在PolyBench基准套件上评估了这两个插件。结果表明,标量替换和restrict关键字的使用对于提高OpenACC、OpenMP 4.0/4.5和CUDA代码的整体性能都是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
[Engineering Paper] RECKA and RPromF: Two Frama-C Plug-ins for Optimizing Registers Usage in CUDA, OpenACC and OpenMP Programs
Pointer aliasing still hinders compiler optimizations. The ISO C standard 99 has added the restrict keyword that allows programmer to specify non-aliasing as an aid to the compiler's optimizer. The task of annotating pointers with the restrict keyword is still left to the programmer and this task is, in general, tedious and prone to errors. Scalar replacement is an optimization widely used by compilers. In this paper, we present two new Frama-C plug-ins, RECKA for automatic annotation of CUDA kernels arguments with the restrict keyword, and RPromF for scalar replacement in OpenACC and OpenMP 4.0/4.5 codes for GPU. More specifically, RECKA works as follows: (i) an alias analysis is performed on CUDA kernels and their callers; (ii) if not found any alias then CUDA kernels are cloned, the clones are renamed and their arguments are annotated with the restrict qualifier; and (iii) instructions are added to kernels call sites to perform at runtime a less-than check analysis on kernel actuals parameters and determine if the clone must be called or the original one. RPromF includes five main steps: (i) OpenACC/OpenMP offloading regions are identified; (ii) functions containing these offloading codes and their callers are analyzed to check that there is no alias; (iii) if there is no alias then the offloading codes are cloned; (iv) clone's instructions are analyzed to retrieve data reuse information and perform scalar replacement; and instructions are added to be able to use the optimized clone whenever possible. We have evaluated the two plug-ins on PolyBench benchmark suite. The results show that both scalar replacement and the usage of restrict keyword are effective for improving the overall performance of OpenACC, OpenMP 4.0/4.5 and CUDA codes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信