Exploration of automatic optimization for CUDA programming

M. Al-Mouhamed, A. ul Hassan Khan
{"title":"Exploration of automatic optimization for CUDA programming","authors":"M. Al-Mouhamed, A. ul Hassan Khan","doi":"10.1109/PDGC.2012.6449791","DOIUrl":null,"url":null,"abstract":"Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a 3-step algorithm which are loop tiling, coalesced memory access, and resource optimization. We also establish the relationships between the influencing parameters and propose a method for finding possible tiling solutions with coalesced memory access that best meets the identified constraints. We also present a simplified algorithm for restructuring loops and rewrite them as an efficient CUDA Kernel. The execution model of synthesized kernel consists of uniformly distributing the kernel threads to keep all cores busy while transferring a tailored data locality which is accessed using coalesced pattern to amortize the long latency of the secondary memory. In the evaluation, we implement some simple applications using the proposed restructuring strategy and evaluate the performance in terms of execution time and GPU throughput.","PeriodicalId":166718,"journal":{"name":"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2012.6449791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a 3-step algorithm which are loop tiling, coalesced memory access, and resource optimization. We also establish the relationships between the influencing parameters and propose a method for finding possible tiling solutions with coalesced memory access that best meets the identified constraints. We also present a simplified algorithm for restructuring loops and rewrite them as an efficient CUDA Kernel. The execution model of synthesized kernel consists of uniformly distributing the kernel threads to keep all cores busy while transferring a tailored data locality which is accessed using coalesced pattern to amortize the long latency of the secondary memory. In the evaluation, we implement some simple applications using the proposed restructuring strategy and evaluate the performance in terms of execution time and GPU throughput.
CUDA编程的自动优化探索
图形处理单元(gpu)正在高性能计算领域取得进展。CUDA (C语言的一种扩展)是最广泛用于通用GPU计算的并行编程框架。然而,编写优化CUDA程序的任务即使对专家来说也是复杂的。我们提出了一种基于循环平铺、合并内存访问和资源优化三步算法将循环重组为优化的CUDA内核的方法。我们还建立了影响参数之间的关系,并提出了一种方法来寻找具有合并内存访问的可能平铺解决方案,该方法最能满足所识别的约束。我们还提出了一种简化的循环重组算法,并将其重写为高效的CUDA内核。合成内核的执行模型包括:均匀分布内核线程,使所有内核都处于繁忙状态,同时传输定制的数据位置,并使用合并模式访问数据位置,以抵消辅助存储器的长延迟。在评估中,我们使用提出的重构策略实现了一些简单的应用程序,并从执行时间和GPU吞吐量方面评估了性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信