Huayou Su, Chunyuan Zhang, Jun Chai, M. Wen, N. Wu, Ju Ren
{"title":"A high-efficient software parallel CAVCL encoder based on GPU","authors":"Huayou Su, Chunyuan Zhang, Jun Chai, M. Wen, N. Wu, Ju Ren","doi":"10.1109/TSP.2011.6043672","DOIUrl":null,"url":null,"abstract":"This article presents an efficient parallel CAVLC encoder for H.264/AVC based on GPU. By optimizing the architecture of the encoder, three kinds of dependences are eliminated or weakened, including the context dependence, the memory accessing dependence, and the control dependence. We divide the execution of CAVLC into three stages: two scans, component-oriented coding, and lag packing. For each stage, data of a frame can be processed synchronously. Experimental results show that the proposed parallel CAVLC encoder can achieve more than 30 times speedup when compared with the CPU version, and a real-time process for 720p @ 30p can be achieved. The throughput of the presented CAVLC encoder is 11.17 to 6.29 times higher than that of the published software encoders on DSP and multi-core platforms.","PeriodicalId":341695,"journal":{"name":"2011 34th International Conference on Telecommunications and Signal Processing (TSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 34th International Conference on Telecommunications and Signal Processing (TSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSP.2011.6043672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This article presents an efficient parallel CAVLC encoder for H.264/AVC based on GPU. By optimizing the architecture of the encoder, three kinds of dependences are eliminated or weakened, including the context dependence, the memory accessing dependence, and the control dependence. We divide the execution of CAVLC into three stages: two scans, component-oriented coding, and lag packing. For each stage, data of a frame can be processed synchronously. Experimental results show that the proposed parallel CAVLC encoder can achieve more than 30 times speedup when compared with the CPU version, and a real-time process for 720p @ 30p can be achieved. The throughput of the presented CAVLC encoder is 11.17 to 6.29 times higher than that of the published software encoders on DSP and multi-core platforms.