Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2021-06-06 DOI:10.1109/AICAS51828.2021.9458479

Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang

{"title":"Two-phase Scheme for Trimming QTMT CU Partition using Multi-branch Convolutional Neural Networks","authors":"Pin-Chieh Fu, Chia-Cheng Yen, Nien-Chen Yang, Jia-Shung Wang","doi":"10.1109/AICAS51828.2021.9458479","DOIUrl":null,"url":null,"abstract":"Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 \\times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 \\times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.","PeriodicalId":173204,"journal":{"name":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS51828.2021.9458479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Versatile Video Coding (VVC) initialized in October 2017, will provide the same subjective quality at roughly 50% the bitrate of its predecessor HEVC. VVC introduced a complex structure of quad-tree plus multi-type tree block partitioning (QT + MTT, or QTMT) in each $128 \times 128$ block. However, it brings more encoding complexity. In this work, to tackle this problem effectively, a two-phase scheme for trimming QTMT CU partition using multi-branch CNN is presented. The goal is to predict the (QTMT) depth of QTMT partitioning on the basis of each block of size $32 \times 32$. In the first phase, a backbone CNN followed by three parallel branches extracts latent features to predict which QT depth and whether using ternary-tree (TT) or not. In the second phase, based on the above prediction information, a huge number of possible (distinct) combinations of QTMT CU partition can be trimmed to reduce computational complexity. However, the practice of multiple branches leads to a significant increase in the amount of neural parameters in the CNN and consequently, the total computations of both training and inferencing will be raised significantly. Therefore, effective deep learning modules in MobilenetV2 are applied to downgrade the amount of parameters to an adequate level eventually. The experimental results show that the proposed method achieves 42.341% average saving of encoding time for all VVC test sequences and with 0.71 Bjntegaard-Delta bit-rate (BD-BR) increasing compared with VTM 6.1 in All-intra (AI) configuration.

查看原文本刊更多论文

基于多分支卷积神经网络的QTMT CU分区两相修剪方案

通用视频编码(VVC)于2017年10月初始化，将以其前身HEVC的50%比特率提供相同的主观质量。VVC在每个$128 × 128$块中引入了四叉树加多类型树块分区(QT + MTT，或QTMT)的复杂结构。然而，它带来了更多的编码复杂性。为了有效地解决这一问题，本文提出了一种基于多分支CNN的两阶段QTMT CU分区修剪方案。目标是在每个大小为$32 \ × 32$的块的基础上预测QTMT分区的(QTMT)深度。在第一阶段，主干CNN和三个平行分支提取潜在特征，以预测QT深度和是否使用三叉树(TT)。在第二阶段，基于上述预测信息，可以裁减大量可能的(不同的)QTMT CU分区组合，以降低计算复杂性。然而，多分支的做法会导致CNN中神经参数的数量显著增加，因此训练和推理的总计算量都会显著提高。因此，应用MobilenetV2中有效的深度学习模块，最终将参数的数量降低到适当的水平。实验结果表明，该方法在所有VVC测试序列中平均节省了42.341%的编码时间，比全intra (AI)配置下的VTM 6.1提高了0.71 Bjntegaard-Delta比特率(BD-BR)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量