Grover:通过禁用OpenCL内核中的本地内存使用来寻找性能改进

Jianbin Fang, H. Sips, P. Jääskeläinen, A. Varbanescu
{"title":"Grover:通过禁用OpenCL内核中的本地内存使用来寻找性能改进","authors":"Jianbin Fang, H. Sips, P. Jääskeläinen, A. Varbanescu","doi":"10.1109/ICPP.2014.25","DOIUrl":null,"url":null,"abstract":"Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical approach: by disabling the use of local memory in OpenCL kernels, we enable users to compare the kernel versions with and without local memory, and further choose the best performing version for a given platform. To this end, we have designed Grover, a method to automatically remove local memory usage from OpenCL kernels. In particular, we create a correspondence between the global and local memory spaces, which is used to replace local memory accesses by global memory accesses. We have implemented this scheme in the LLVM framework as a compiling pass, which automatically transforms an OpenCL kernel with local memory to a version without it. We have validated Grover with 11 applications, and found that it can successfully disable local memory usage for all of them. We have compared the kernels with and without local memory on three different processors, and found performance improvements for more than a third of the test cases after Grover disabled local memory usage. We conclude that such a compiler pass can be beneficial for performance, and, because it is fully automated, it can be used as an auto-tuning step for OpenCL kernels.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"22 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels\",\"authors\":\"Jianbin Fang, H. Sips, P. Jääskeläinen, A. Varbanescu\",\"doi\":\"10.1109/ICPP.2014.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical approach: by disabling the use of local memory in OpenCL kernels, we enable users to compare the kernel versions with and without local memory, and further choose the best performing version for a given platform. To this end, we have designed Grover, a method to automatically remove local memory usage from OpenCL kernels. In particular, we create a correspondence between the global and local memory spaces, which is used to replace local memory accesses by global memory accesses. We have implemented this scheme in the LLVM framework as a compiling pass, which automatically transforms an OpenCL kernel with local memory to a version without it. We have validated Grover with 11 applications, and found that it can successfully disable local memory usage for all of them. We have compared the kernels with and without local memory on three different processors, and found performance improvements for more than a third of the test cases after Grover disabled local memory usage. We conclude that such a compiler pass can be beneficial for performance, and, because it is fully automated, it can be used as an auto-tuning step for OpenCL kernels.\",\"PeriodicalId\":441115,\"journal\":{\"name\":\"2014 43rd International Conference on Parallel Processing\",\"volume\":\"22 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 43rd International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2014.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

由于处理器体系结构和应用程序内存访问模式的多样性,在OpenCL内核中使用本地内存对性能的影响已经变得不可预测。例如,为OpenCL内核启用本地内存可能有利于GPU上的执行,但在CPU上运行时可能会导致性能损失。为了解决这种不可预测性,我们提出了一种经验方法:通过在OpenCL内核中禁用本地内存,我们使用户能够比较有和没有本地内存的内核版本,并进一步为给定平台选择性能最佳的版本。为此,我们设计了Grover,这是一种自动从OpenCL内核中删除本地内存使用的方法。特别是,我们在全局和局部内存空间之间创建了对应关系,用全局内存访问来替换局部内存访问。我们已经在LLVM框架中实现了这个方案,作为一个编译通道,它自动将有本地内存的OpenCL内核转换为没有本地内存的版本。我们已经用11个应用程序验证了Grover,发现它可以成功地禁用所有应用程序的本地内存使用。我们在三种不同的处理器上比较了有和没有本地内存的内核,发现在Grover禁用本地内存使用后,超过三分之一的测试用例的性能得到了改善。我们得出的结论是,这样的编译器传递对性能是有益的,而且,由于它是完全自动化的,它可以用作OpenCL内核的自动调优步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels
Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on a CPU. To address this unpredictability, we propose an empirical approach: by disabling the use of local memory in OpenCL kernels, we enable users to compare the kernel versions with and without local memory, and further choose the best performing version for a given platform. To this end, we have designed Grover, a method to automatically remove local memory usage from OpenCL kernels. In particular, we create a correspondence between the global and local memory spaces, which is used to replace local memory accesses by global memory accesses. We have implemented this scheme in the LLVM framework as a compiling pass, which automatically transforms an OpenCL kernel with local memory to a version without it. We have validated Grover with 11 applications, and found that it can successfully disable local memory usage for all of them. We have compared the kernels with and without local memory on three different processors, and found performance improvements for more than a third of the test cases after Grover disabled local memory usage. We conclude that such a compiler pass can be beneficial for performance, and, because it is fully automated, it can be used as an auto-tuning step for OpenCL kernels.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信