{"title":"基于Tilera-GX36多核平台的HEVC解码细粒度多级并行算法","authors":"Yi Li, D. Hu, Wenxiang Zhang","doi":"10.1145/3331453.3361646","DOIUrl":null,"url":null,"abstract":"This paper presents a fine granular multilevel parallel algorithm for HEVC decoding, which is focus on two modules: the pixel reconstruction and the in-loop filter (ILF). Firstly, in the pixel reconstruction module, the problem of unbalanced load among decoding threads may occur in parallel decoding algorithm based on CTU rows, like wavefront parallel processing (WPP), because there may be big differences in the processing complexity of CTUs within various CTU rows. In order to solve this issue, we propose a fine granular parallel algorithm based on WPP. Then, a fast fusion ILF algorithm has been presented in the ILF module, which realizes the deep coupling between DBF and SAO by a new CTU-like process unit. Furthermore, the introduction of pipeline parallel technology makes a latency reduction between the two modules. We implement our method on Tilera-GX36 multi-core platform, each task executed by a thread bound to a separate core, which makes full use of the parallel computing performance. The experimental results show that the speedup of our proposed algorithms are 1.2, 1.13 and 1.8 times higher than that of those previous schemes respectively.","PeriodicalId":162067,"journal":{"name":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fine Granular Multilevel Parallel Algorithm for HEVC Decoding Based on Tilera-GX36 Multi-core Platform\",\"authors\":\"Yi Li, D. Hu, Wenxiang Zhang\",\"doi\":\"10.1145/3331453.3361646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a fine granular multilevel parallel algorithm for HEVC decoding, which is focus on two modules: the pixel reconstruction and the in-loop filter (ILF). Firstly, in the pixel reconstruction module, the problem of unbalanced load among decoding threads may occur in parallel decoding algorithm based on CTU rows, like wavefront parallel processing (WPP), because there may be big differences in the processing complexity of CTUs within various CTU rows. In order to solve this issue, we propose a fine granular parallel algorithm based on WPP. Then, a fast fusion ILF algorithm has been presented in the ILF module, which realizes the deep coupling between DBF and SAO by a new CTU-like process unit. Furthermore, the introduction of pipeline parallel technology makes a latency reduction between the two modules. We implement our method on Tilera-GX36 multi-core platform, each task executed by a thread bound to a separate core, which makes full use of the parallel computing performance. The experimental results show that the speedup of our proposed algorithms are 1.2, 1.13 and 1.8 times higher than that of those previous schemes respectively.\",\"PeriodicalId\":162067,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"volume\":\"206 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331453.3361646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331453.3361646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fine Granular Multilevel Parallel Algorithm for HEVC Decoding Based on Tilera-GX36 Multi-core Platform
This paper presents a fine granular multilevel parallel algorithm for HEVC decoding, which is focus on two modules: the pixel reconstruction and the in-loop filter (ILF). Firstly, in the pixel reconstruction module, the problem of unbalanced load among decoding threads may occur in parallel decoding algorithm based on CTU rows, like wavefront parallel processing (WPP), because there may be big differences in the processing complexity of CTUs within various CTU rows. In order to solve this issue, we propose a fine granular parallel algorithm based on WPP. Then, a fast fusion ILF algorithm has been presented in the ILF module, which realizes the deep coupling between DBF and SAO by a new CTU-like process unit. Furthermore, the introduction of pipeline parallel technology makes a latency reduction between the two modules. We implement our method on Tilera-GX36 multi-core platform, each task executed by a thread bound to a separate core, which makes full use of the parallel computing performance. The experimental results show that the speedup of our proposed algorithms are 1.2, 1.13 and 1.8 times higher than that of those previous schemes respectively.