Machine Learning Accelerated Transform Search For AV1

2019 Picture Coding Symposium (PCS) Pub Date : 2019-11-01 DOI:10.1109/PCS48520.2019.8954514

Hui Su, Mingliang Chen, A. Bokov, D. Mukherjee, Yunqing Wang, Yue Chen

{"title":"Machine Learning Accelerated Transform Search For AV1","authors":"Hui Su, Mingliang Chen, A. Bokov, D. Mukherjee, Yunqing Wang, Yue Chen","doi":"10.1109/PCS48520.2019.8954514","DOIUrl":null,"url":null,"abstract":"AV1 is the state-of-the-art open and royalty-free video compression format that achieves significant bitrate savings over previous generation of video codecs. One of AV1’s major improvement over its predecessor VP9 is the support of more diverse and flexible transform size and kernel selection. However, it also drastically increases the search space for transform unit rate-distortion optimization in AV1 encoders. Unlike conventional encoder speed features that are based on heuristics, we propose a machine learning (ML) based approach to accelerate the transform size and kernel search for AV1. The ML models use input features extracted from the prediction residue block such as standard deviation, correlation and energy distribution. The output of the models indicates the estimated likelihood of which transform size and kernel would be selected as the optimal choice. Based on the ML models, the encoder can prune out the transform size and kernel candidates that are unlikely to be selected and save unnecessary computation to compute their rate-distortion cost. The proposed approach is implemented and tested on the AV1 reference library libaom. The experimental results show that satisfactory encoding speed improvement can be achieved with extremely low compression performance loss. The framework and methodology can also be easily migrated to other video codecs and implementations.","PeriodicalId":237809,"journal":{"name":"2019 Picture Coding Symposium (PCS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Picture Coding Symposium (PCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCS48520.2019.8954514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

AV1 is the state-of-the-art open and royalty-free video compression format that achieves significant bitrate savings over previous generation of video codecs. One of AV1’s major improvement over its predecessor VP9 is the support of more diverse and flexible transform size and kernel selection. However, it also drastically increases the search space for transform unit rate-distortion optimization in AV1 encoders. Unlike conventional encoder speed features that are based on heuristics, we propose a machine learning (ML) based approach to accelerate the transform size and kernel search for AV1. The ML models use input features extracted from the prediction residue block such as standard deviation, correlation and energy distribution. The output of the models indicates the estimated likelihood of which transform size and kernel would be selected as the optimal choice. Based on the ML models, the encoder can prune out the transform size and kernel candidates that are unlikely to be selected and save unnecessary computation to compute their rate-distortion cost. The proposed approach is implemented and tested on the AV1 reference library libaom. The experimental results show that satisfactory encoding speed improvement can be achieved with extremely low compression performance loss. The framework and methodology can also be easily migrated to other video codecs and implementations.

查看原文本刊更多论文

AV1是最先进的开放和免版税的视频压缩格式，实现显着比特率节省比上一代视频编解码器。AV1相对于其前身VP9的主要改进之一是支持更多样化和灵活的转换大小和内核选择。然而，它也大大增加了AV1编码器中变换单位率失真优化的搜索空间。与传统的基于启发式的编码器速度特征不同，我们提出了一种基于机器学习(ML)的方法来加速AV1的变换大小和内核搜索。机器学习模型使用从预测残差块中提取的输入特征，如标准差、相关性和能量分布。模型的输出表明了哪种变换大小和核被选择为最优选择的估计可能性。基于机器学习模型，编码器可以剔除不太可能被选中的变换大小和候选核，节省不必要的计算来计算它们的率失真代价。该方法在AV1参考图书馆图书馆上进行了实现和测试。实验结果表明，该方法可以在极低的压缩性能损失下实现令人满意的编码速度提升。该框架和方法也可以很容易地移植到其他视频编解码器和实现中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Picture Coding Symposium (PCS)

自引率

0.00%

发文量