Execution-time optimization based on thread and block repartitions on a graphic processing unit

Randa Khemiri, F. Sayadi, Haythem Bahri, Marwa Chouchene, Mohamed Atri
{"title":"Execution-time optimization based on thread and block repartitions on a graphic processing unit","authors":"Randa Khemiri, F. Sayadi, Haythem Bahri, Marwa Chouchene, Mohamed Atri","doi":"10.1109/ICEMIS.2017.8273052","DOIUrl":null,"url":null,"abstract":"With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.","PeriodicalId":117908,"journal":{"name":"2017 International Conference on Engineering & MIS (ICEMIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS.2017.8273052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.
基于图形处理单元上的线程和块重分区的执行时间优化
随着多媒体技术和网络通信的飞速发展,图形处理单元(GPU)等并行架构被引入高性能计算领域。但是,如何对这个GPU进行编程以及如何获得最佳执行时间通常仍然是一门艺术。本文对高效视频编码(HEVC)中导致64×64 (PU64)计算的预测单元的线程和块数进行了搜索研究。它是通过计算统一设备架构(CUDA)提出的。描述了优化GPU执行时间的方法。实验结果表明,在128块32线程的情况下,选择了运行GPU内核的最佳网格拓扑。与CPU相比,这个建议的重分区提供了最小的GPU执行时间,这里获得的加速大约是50%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信