Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang
{"title":"Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks","authors":"Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang","doi":"10.1109/AICAS51828.2021.9458479","DOIUrl":null,"url":null,"abstract":"Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 \\times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 \\times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS51828.2021.9458479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 \times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 \times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.