Batch-Pipelining for H.264 Decoding on Multicore Systems

2010 Data Compression Conference Pub Date : 2010-03-24 DOI:10.1109/DCC.2010.57

Tang-Hsun Tu, Chih-wen Hsueh

{"title":"Batch-Pipelining for H.264 Decoding on Multicore Systems","authors":"Tang-Hsun Tu, Chih-wen Hsueh","doi":"10.1109/DCC.2010.57","DOIUrl":null,"url":null,"abstract":"Pipelining has been applied in many area to improve performance by overlapping executions of computing stages. However, it is difficult to apply on H.264/AVC decoding in frame level, because the bitstreams are encoded with lots of dependencies and little parallelism is left to be explored. Therefore, many researches can only adopt hardware assistance. Fortunately, pure software pipelining can be applied on H.264/AVC decoding in macroblock level with reasonable performance gain. However, the pipeline stages might need to synchronize with other stages and incur lots of extra overhead. Moreover, the overhead becomes relatively larger as the stages themselves are executed faster with better hardware and software optimization. We first group multiple stages into larger groups as ”batched” pipelining to execute concurrently in multicore systems. The stages in different groups might not need to synchronize to each other so that it incurs little overhead and can be highly scalable. Therefore, a novel effective batch-pipeline (BP) approach for H.264/AVC decoding on multicore systems is proposed. Moreover, because of its flexibility, BP can be used with other hardware approaches or software technologies to further improve performance. To optimize our approach, we analyze how to group the macroblocks and derive close-form formulas to guide the grouping. We also conduct various experiments on various bitstreams to verify our approach. The results show that it can speed up to 93% and achieve up to 249 and 70 FPS for 720P and 1080P resolutions, respectively, on a 4-core machine over a published optimized H.264 decoder.We believe our batch-pipelining approach creates a new effective direction for multimedia software codec development.","PeriodicalId":299459,"journal":{"name":"2010 Data Compression Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2010.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Pipelining has been applied in many area to improve performance by overlapping executions of computing stages. However, it is difficult to apply on H.264/AVC decoding in frame level, because the bitstreams are encoded with lots of dependencies and little parallelism is left to be explored. Therefore, many researches can only adopt hardware assistance. Fortunately, pure software pipelining can be applied on H.264/AVC decoding in macroblock level with reasonable performance gain. However, the pipeline stages might need to synchronize with other stages and incur lots of extra overhead. Moreover, the overhead becomes relatively larger as the stages themselves are executed faster with better hardware and software optimization. We first group multiple stages into larger groups as ”batched” pipelining to execute concurrently in multicore systems. The stages in different groups might not need to synchronize to each other so that it incurs little overhead and can be highly scalable. Therefore, a novel effective batch-pipeline (BP) approach for H.264/AVC decoding on multicore systems is proposed. Moreover, because of its flexibility, BP can be used with other hardware approaches or software technologies to further improve performance. To optimize our approach, we analyze how to group the macroblocks and derive close-form formulas to guide the grouping. We also conduct various experiments on various bitstreams to verify our approach. The results show that it can speed up to 93% and achieve up to 249 and 70 FPS for 720P and 1080P resolutions, respectively, on a 4-core machine over a published optimized H.264 decoder.We believe our batch-pipelining approach creates a new effective direction for multimedia software codec development.

查看原文本刊更多论文

多核系统上H.264解码的批处理流水线

流水线在许多领域得到了应用，通过重叠执行计算阶段来提高性能。然而，由于比特流编码具有大量的依赖关系，并行性很少，因此很难应用于帧级的H.264/AVC解码。因此，许多研究只能采用硬件辅助。幸运的是，纯软件流水线可以应用于H.264/AVC的宏块级解码，并获得合理的性能增益。然而，管道阶段可能需要与其他阶段同步，并产生大量额外的开销。此外，由于使用更好的硬件和软件优化，阶段本身执行得更快，因此开销变得相对较大。我们首先将多个阶段分成更大的组，作为“批处理”流水线，在多核系统中并发执行。不同组中的阶段可能不需要彼此同步，因此产生的开销很少，并且可以高度扩展。为此，提出了一种新的、有效的多核系统H.264/AVC解码的批处理流水线(BP)方法。此外，由于其灵活性，BP可以与其他硬件方法或软件技术一起使用，以进一步提高性能。为了优化我们的方法，我们分析了如何对宏块进行分组，并推导出紧密形式的公式来指导分组。我们还在不同的比特流上进行了各种实验来验证我们的方法。结果表明，在4核机器上，使用已发布的优化的H.264解码器，在720P和1080P分辨率下，它的速度可达93%，帧率可达249帧，帧率可达70帧。我们相信我们的批处理流水线方法为多媒体软件编解码器的开发创造了一个新的有效的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Data Compression Conference

自引率

0.00%

发文量