Randa Khemiri, F. Sayadi, Haythem Bahri, Marwa Chouchene, Mohamed Atri
{"title":"Execution-time optimization based on thread and block repartitions on a graphic processing unit","authors":"Randa Khemiri, F. Sayadi, Haythem Bahri, Marwa Chouchene, Mohamed Atri","doi":"10.1109/ICEMIS.2017.8273052","DOIUrl":null,"url":null,"abstract":"With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.","PeriodicalId":117908,"journal":{"name":"2017 International Conference on Engineering & MIS (ICEMIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Engineering & MIS (ICEMIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEMIS.2017.8273052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
With the rapid development of multimedia technologies and network communication, the parallel architecture such as the Graphic Processing Unit (GPU) is introduced in high-performance computing. But, how to program this GPU and how to obtain the best execution time remains usually an art. In this paper, a search study is performed on the Thread and the Block number that leads to a Prediction Unit of 64×64 (PU64) computation in the High Efficiency Video Coding (HEVC). It is proposed through the Compute Unified Device Architecture (CUDA). This method is described to optimize the GPU execution time. Experimental results show that the best Grid topology chosen to run the GPU kernel is obtained for 128 Block and 32 Thread. This proposed repartition gives the minimum GPU execution time compared to the CPU one, where the speed-up obtained here is around 50%.