CNN Quadtree Depth Decision Prediction for Block Partitioning in HEVC Intra-Mode

2023 Data Compression Conference (DCC) Pub Date : 2023-03-01 DOI:10.1109/DCC55655.2023.00054

Iris Linck, A. T. Gómez, G. Alaghband

{"title":"CNN Quadtree Depth Decision Prediction for Block Partitioning in HEVC Intra-Mode","authors":"Iris Linck, A. T. Gómez, G. Alaghband","doi":"10.1109/DCC55655.2023.00054","DOIUrl":null,"url":null,"abstract":"High Efficiency Video Coding. (HEVC) reflects the new international standardization for digital video coding technology. HEVC achieves higher compression compared to its antecessor at the expense of dramatically increasing coding complexity due to the use of a recursive quadtree to partition every frame to various block sizes, a process called prediction mode. We propose three CNNs based on VGGNet, one CNN for each CU size of 64x64, 32x32, and 16x16, as shown in Figure 1, to predict the quadtree levels for the CU blocks of HEVC reducing its code complexity. The new CNNs simplify the original VGGNet in terms of number of convolutional layers while maintaining the original 3x3 filters. As our model is designed to recognize the quadtree structure of a block of pixels instead of image categories, a shallow version of the VGGNet combined with our CU partition datasets will provide fast and accurate results. The accuracy of the model can be further improved because the input CU size is consistent with the size of CU encoded by HEVC, that avoids losses in the CU texture features. Our CNN models learn from three customized datasets of CU blocks encoded in the specific QP of 32. In this way there is no need to introduce QP as a parameter in the loss function used in other works, and further increase accuracy. Given the success of this idea, in the future models will have separate training for each QP of 22, 27 and 37, respectively.","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

High Efficiency Video Coding. (HEVC) reflects the new international standardization for digital video coding technology. HEVC achieves higher compression compared to its antecessor at the expense of dramatically increasing coding complexity due to the use of a recursive quadtree to partition every frame to various block sizes, a process called prediction mode. We propose three CNNs based on VGGNet, one CNN for each CU size of 64x64, 32x32, and 16x16, as shown in Figure 1, to predict the quadtree levels for the CU blocks of HEVC reducing its code complexity. The new CNNs simplify the original VGGNet in terms of number of convolutional layers while maintaining the original 3x3 filters. As our model is designed to recognize the quadtree structure of a block of pixels instead of image categories, a shallow version of the VGGNet combined with our CU partition datasets will provide fast and accurate results. The accuracy of the model can be further improved because the input CU size is consistent with the size of CU encoded by HEVC, that avoids losses in the CU texture features. Our CNN models learn from three customized datasets of CU blocks encoded in the specific QP of 32. In this way there is no need to introduce QP as a parameter in the loss function used in other works, and further increase accuracy. Given the success of this idea, in the future models will have separate training for each QP of 22, 27 and 37, respectively.

查看原文本刊更多论文

基于HEVC Intra-Mode的CNN四叉树深度决策预测

高效视频编码。(HEVC)反映了数字视频编码技术新的国际标准化。HEVC实现了比其前身更高的压缩，但代价是极大地增加了编码复杂性，因为使用递归四叉树将每帧划分为不同的块大小，这个过程称为预测模式。我们提出了三个基于VGGNet的CNN，每个CNN分别对应64x64、32x32和16x16的CU大小，如图1所示，以预测HEVC的CU块的四叉树水平，降低其代码复杂度。新的cnn在保留原始的3x3滤波器的同时，在卷积层数方面简化了原始的VGGNet。由于我们的模型被设计为识别像素块的四叉树结构而不是图像类别，因此结合我们的CU分区数据集的浅版本VGGNet将提供快速准确的结果。由于输入的CU尺寸与HEVC编码的CU尺寸一致，避免了CU纹理特征的丢失，可以进一步提高模型的精度。我们的CNN模型从三个定制的CU块数据集学习，这些数据集编码在特定的QP 32中。这样就不需要在其他工作中使用的损失函数中引入QP作为参数，进一步提高了精度。考虑到这个想法的成功，在未来的模型中，每个QP分别有22、27和37个单独的训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 Data Compression Conference (DCC)

自引率

0.00%

发文量