Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC

Bing Zhang, Ran Ma, Yu Cao, Ping An
{"title":"Swin-VEC: Video Swin Transformer-based GAN for video error concealment of VVC","authors":"Bing Zhang, Ran Ma, Yu Cao, Ping An","doi":"10.1007/s00371-024-03518-9","DOIUrl":null,"url":null,"abstract":"<p>Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03518-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Video error concealment can effectively improve the visual perception quality of videos damaged by packet loss in video transmission or error reception at the decoder. The latest versatile video coding (VVC) standard further improves the compression performance and lacks error recovery mechanism, which makes the VVC bitstream highly sensitive to errors. Most of the existing error concealment algorithms are designed for the video coding standards before VVC and are not applicable to VVC; thus, the research on video error concealment for VVC is urgently needed. In this paper, a novel deep video error concealment model for VVC is proposed, called Swin-VEC. The model innovatively integrates Video Swin Transformer into the generator of generative adversarial network (GAN). Specifically, the generator of the model employs convolutional neural network (CNN) to extract shallow features, and utilizes the Video Swin Transformer to extract deep multi-scale features. Subsequently, the designed dual upsampling modules are used to accomplish the recovery of spatiotemporal dimensions, and combined with CNN to achieve frame reconstruction. Moreover, an augmented dataset BVI-DVC-VVC is constructed for model training and verification. The optimization of the model is realized by adversarial training. Extensive experiments on BVI-DVC-VVC and UCF101 demonstrate the effectiveness and superiority of our proposed model for the video error concealment of VVC.

Abstract Image

Swin-VEC:基于视频斯温变换器的 GAN,用于 VVC 的视频错误隐藏
视频错误隐藏可以有效改善因视频传输过程中丢包或解码器接收错误而受损的视频的视觉感知质量。最新的通用视频编码(VVC)标准进一步提高了压缩性能,但缺乏错误恢复机制,这使得 VVC 比特流对错误高度敏感。现有的错误隐藏算法大多是针对 VVC 之前的视频编码标准设计的,不适用于 VVC;因此,针对 VVC 的视频错误隐藏研究迫在眉睫。本文提出了一种新型的 VVC 深度视频错误隐藏模型,称为 Swin-VEC。该模型创新性地将视频 Swin 变换器集成到生成式对抗网络(GAN)的生成器中。具体来说,该模型的生成器采用卷积神经网络(CNN)提取浅层特征,并利用视频 Swin 变换器提取深层多尺度特征。随后,利用设计的双上采样模块完成时空维度的恢复,并结合 CNN 实现帧重建。此外,还构建了一个增强数据集 BVI-DVC-VVC,用于模型训练和验证。模型的优化是通过对抗训练实现的。在 BVI-DVC-VVC 和 UCF101 上进行的大量实验证明了我们提出的模型在 VVC 视频错误隐藏方面的有效性和优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信