Execution-time optimization based on thread and block repartitions on a graphic processing unit

2017 International Conference on Engineering & MIS (ICEMIS) Pub Date : 2017-05-01 DOI:10.1109/ICEMIS.2017.8273052

Randa Khemiri, F. Sayadi, Haythem Bahri, Marwa Chouchene, Mohamed Atri

引用次数: 2

Abstract

With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.

查看原文本刊更多论文

基于图形处理单元上的线程和块重分区的执行时间优化

随着多媒体技术和网络通信的飞速发展，图形处理单元(GPU)等并行架构被引入高性能计算领域。但是，如何对这个GPU进行编程以及如何获得最佳执行时间通常仍然是一门艺术。本文对高效视频编码(HEVC)中导致64×64 (PU64)计算的预测单元的线程和块数进行了搜索研究。它是通过计算统一设备架构(CUDA)提出的。描述了优化GPU执行时间的方法。实验结果表明，在128块32线程的情况下，选择了运行GPU内核的最佳网格拓扑。与CPU相比，这个建议的重分区提供了最小的GPU执行时间，这里获得的加速大约是50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Conference on Engineering & MIS (ICEMIS)

自引率

0.00%

发文量