Video Coding for Machines with Feature-Based Rate-Distortion Optimization

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2020-09-21 DOI:10.1109/MMSP48831.2020.9287136

Kristian Fischer, Fabian Brand, Christian Herglotz, A. Kaup

{"title":"Video Coding for Machines with Feature-Based Rate-Distortion Optimization","authors":"Kristian Fischer, Fabian Brand, Christian Herglotz, A. Kaup","doi":"10.1109/MMSP48831.2020.9287136","DOIUrl":null,"url":null,"abstract":"Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49% bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95% compared to conventional VTM.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP48831.2020.9287136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49% bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95% compared to conventional VTM.

查看原文本刊更多论文

基于特征率失真优化的机器视频编码

常见的最先进的视频编解码器经过优化，通过为最终的人类观察者提供一定的质量来提供低比特率，这是通过率失真优化(RDO)实现的。但是，随着神经网络解决计算机视觉任务能力的不断提高，越来越多的多媒体数据不再由人类观察，而是由神经网络直接分析。在本文中，我们提出了一种符合标准的基于特征的RDO (FRDO)，旨在提高机器视频编码场景中解码帧的神经网络分析编码性能。在这种程度上，我们将VTM-8.0的传统RDO中基于像素的失真度量替换为在神经网络的第一层创建的特征空间中计算的失真度量。通过对分割网络Mask R-CNN和来自cityscape数据集的单个图像的多次测试，我们将提出的FRDO及其混合版本HFRDO与传统的RDO在特征空间中具有不同的失真措施进行了比较。与VTM-8.0相比，HFRDO在Bjøntegaard Delta Rate方面可以节省5.49%的比特率，并使用加权平均精度作为质量指标。此外，允许编码器改变量化参数，与传统VTM相比，所提出的HFRDO的编码增益高达9.95%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

自引率

0.00%

发文量