{"title":"CNN Quadtree Depth Decision Prediction for Block Partitioning in HEVC Intra-Mode","authors":"Iris Linck, A. T. Gómez, G. Alaghband","doi":"10.1109/DCC55655.2023.00054","DOIUrl":null,"url":null,"abstract":"High Efficiency Video Coding. (HEVC) reflects the new international standardization for digital video coding technology. HEVC achieves higher compression compared to its antecessor at the expense of dramatically increasing coding complexity due to the use of a recursive quadtree to partition every frame to various block sizes, a process called prediction mode. We propose three CNNs based on VGGNet, one CNN for each CU size of 64x64, 32x32, and 16x16, as shown in Figure 1, to predict the quadtree levels for the CU blocks of HEVC reducing its code complexity. The new CNNs simplify the original VGGNet in terms of number of convolutional layers while maintaining the original 3x3 filters. As our model is designed to recognize the quadtree structure of a block of pixels instead of image categories, a shallow version of the VGGNet combined with our CU partition datasets will provide fast and accurate results. The accuracy of the model can be further improved because the input CU size is consistent with the size of CU encoded by HEVC, that avoids losses in the CU texture features. Our CNN models learn from three customized datasets of CU blocks encoded in the specific QP of 32. In this way there is no need to introduce QP as a parameter in the loss function used in other works, and further increase accuracy. Given the success of this idea, in the future models will have separate training for each QP of 22, 27 and 37, respectively.","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
High Efficiency Video Coding. (HEVC) reflects the new international standardization for digital video coding technology. HEVC achieves higher compression compared to its antecessor at the expense of dramatically increasing coding complexity due to the use of a recursive quadtree to partition every frame to various block sizes, a process called prediction mode. We propose three CNNs based on VGGNet, one CNN for each CU size of 64x64, 32x32, and 16x16, as shown in Figure 1, to predict the quadtree levels for the CU blocks of HEVC reducing its code complexity. The new CNNs simplify the original VGGNet in terms of number of convolutional layers while maintaining the original 3x3 filters. As our model is designed to recognize the quadtree structure of a block of pixels instead of image categories, a shallow version of the VGGNet combined with our CU partition datasets will provide fast and accurate results. The accuracy of the model can be further improved because the input CU size is consistent with the size of CU encoded by HEVC, that avoids losses in the CU texture features. Our CNN models learn from three customized datasets of CU blocks encoded in the specific QP of 32. In this way there is no need to introduce QP as a parameter in the loss function used in other works, and further increase accuracy. Given the success of this idea, in the future models will have separate training for each QP of 22, 27 and 37, respectively.