{"title":"对GPU HEVC内部解码:抓住细粒度并行","authors":"D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/ICME.2015.7177515","DOIUrl":null,"url":null,"abstract":"To satisfy the growing demands on real-time video decoders for high frame resolutions, novel GPU parallel algorithms are proposed herein for fully compliant HEVC de-quantization, inverse transform and intra prediction. The proposed algorithms are designed to fully exploit and leverage the fine grain parallelism within these computationally demanding and highly data dependent modules. Moreover, the proposed approaches allow the efficient utilization of the GPU computational resources, while carefully managing the data accesses in the complex GPU memory hierarchy. The experimental results show that the real-time processing is achieved for all tested sequences and the most demanding QP, while delivering average fps of 118.6, 89.2 and 49.7 for Full HD, 2160p and Ultra HD 4K sequences, respectively.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Towards GPU HEVC intra decoding: Seizing fine-grain parallelism\",\"authors\":\"D. Souza, A. Ilic, N. Roma, L. Sousa\",\"doi\":\"10.1109/ICME.2015.7177515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To satisfy the growing demands on real-time video decoders for high frame resolutions, novel GPU parallel algorithms are proposed herein for fully compliant HEVC de-quantization, inverse transform and intra prediction. The proposed algorithms are designed to fully exploit and leverage the fine grain parallelism within these computationally demanding and highly data dependent modules. Moreover, the proposed approaches allow the efficient utilization of the GPU computational resources, while carefully managing the data accesses in the complex GPU memory hierarchy. The experimental results show that the real-time processing is achieved for all tested sequences and the most demanding QP, while delivering average fps of 118.6, 89.2 and 49.7 for Full HD, 2160p and Ultra HD 4K sequences, respectively.\",\"PeriodicalId\":146271,\"journal\":{\"name\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2015.7177515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards GPU HEVC intra decoding: Seizing fine-grain parallelism
To satisfy the growing demands on real-time video decoders for high frame resolutions, novel GPU parallel algorithms are proposed herein for fully compliant HEVC de-quantization, inverse transform and intra prediction. The proposed algorithms are designed to fully exploit and leverage the fine grain parallelism within these computationally demanding and highly data dependent modules. Moreover, the proposed approaches allow the efficient utilization of the GPU computational resources, while carefully managing the data accesses in the complex GPU memory hierarchy. The experimental results show that the real-time processing is achieved for all tested sequences and the most demanding QP, while delivering average fps of 118.6, 89.2 and 49.7 for Full HD, 2160p and Ultra HD 4K sequences, respectively.