基于外观和编码失真的CNN融合视频编码环内滤波方法

Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia
{"title":"基于外观和编码失真的CNN融合视频编码环内滤波方法","authors":"Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia","doi":"10.1109/VCIP49819.2020.9301895","DOIUrl":null,"url":null,"abstract":"With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"394 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding\",\"authors\":\"Jian Yue, Yanbo Gao, Shuai Li, Menghu Jia\",\"doi\":\"10.1109/VCIP49819.2020.9301895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.\",\"PeriodicalId\":431880,\"journal\":{\"name\":\"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"394 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP49819.2020.9301895\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP49819.2020.9301895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

随着卷积神经网络在图像去噪和其他计算机视觉任务中的成功,人们开始研究卷积神经网络在视频编码中的环内滤波。许多现有的方法直接使用cnn作为强大的过滤工具,而没有对其效果进行过多的分析。考虑到环内滤波器处理由固定行视频编码操作产生的重构视频帧,重构帧中的编码失真除了是噪声图像外,可能具有类似cnn可以学习的属性。因此,在本文中,我们首先将基于CNN的滤波分为两类过程:基于外观的CNN滤波和基于编码失真的CNN滤波,并据此开发了两流CNN融合框架。在基于外观的CNN滤波中,CNN将重构后的帧作为失真图像处理,提取全局外观信息恢复原始图像。为了提取全局信息,首先使用带池化的CNN来增加接受域,然后在后期增加上采样来产生像素级的帧信息。相反,在基于编码失真的滤波中,CNN通过聚焦局部信息,将重构后的帧处理为具有一定失真类型的块,学习固定视频编码管道造成的编码失真。最后,将基于外观的滤波流和基于编码失真的滤波流融合在一起,将CNN滤波的两个方面结合起来,将全局信息和局部信息结合起来。为了进一步降低复杂性,相似的初始和最后卷积层在两个流上共享以生成混合CNN。实验表明,与现有的基于cnn的滤波方法相比,该方法在All Intra配置下可节省11.26%的BD-rate。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Mixed Appearance-based and Coding Distortion-based CNN Fusion Approach for In-loop Filtering in Video Coding
With the success of the convolutional neural networks (CNNs) in image denoising and other computer vision tasks, CNNs have been investigated for in-loop filtering in video coding. Many existing methods directly use CNNs as powerful tools for filtering without much analysis on its effect. Considering the in-loop filters process the reconstructed video frames produced from a fixed line of video coding operations, the coding distortion in the reconstructed frames may share similar properties that can be learned by CNNs in addition to being a noisy image. Therefore, in this paper, we first categorize the CNN based filtering into two types of processes: appearance-based CNN filtering and coding distortion-based CNN filtering, and develop a two-stream CNN fusion framework accordingly. In the appearance-based CNN filtering, a CNN processes the reconstructed frame as a distorted image and extracts the global appearance information to restore the original image. In order to extract the global information, a CNN with pooling is used first to increase the receptive field and up-sampling is added in the late stage to produce pixel-level frame information. On the contrary, in the coding distortion-based filtering, a CNN processes the reconstructed frame as blocks with certain types of distortions by focusing on the local information to learn the coding distortion resulted by the fixed video coding pipeline. Finally, the appearance-based filtering stream and the coding distortion-based filtering stream are fused together to combine the two aspects of CNN filtering, and also the global and local information. To further reduce the complexity, the similar initial and last convolutional layers are shared over two streams to generate a mixed CNN. Experiments demonstrate that the proposed method achieves better performance than the existing CNN-based filtering methods, with 11.26% BD-rate saving under the All Intra configuration.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信