Multi-Domain Spatial-Temporal Redundancy Mining for Efficient Learned Video Compression

IF 4.8 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Feng Yuan;Zhaoqing Pan;Jianjun Lei;Bo Peng;Fu Lee Wang;Sam Kwong
{"title":"Multi-Domain Spatial-Temporal Redundancy Mining for Efficient Learned Video Compression","authors":"Feng Yuan;Zhaoqing Pan;Jianjun Lei;Bo Peng;Fu Lee Wang;Sam Kwong","doi":"10.1109/TBC.2025.3587532","DOIUrl":null,"url":null,"abstract":"The Conditional Coding-based Learned Video Compression (CC-LVC) has become an important paradigm in learned video compression, because it can effectively explore spatial-temporal redundancies within a huge context space. However, existing CC-LVC methods cannot accurately model motion information and efficiently mine contextual correlations for complex regions with non-rigid motions and non-linear deformations. To address these problems, an efficient CC-LVC method is proposed in this paper, which mines spatial-temporal dependencies across multiple motion domains and receptive domains for improving the video coding efficiency. To accurately model complex motions and generate precise temporal contexts, a Multi-domain Motion modeling Network (MMNet) is proposed to capture robust motion information from both spatial and frequency domains. Moreover, a multi-domain context refinement module is developed to discriminatively highlight frequency-domain temporal contexts and adaptively fuse multi-domain temporal contexts, which can effectively mitigate inaccuracies in temporal contexts caused by motion errors. In order to efficiently compress video signals, a Multi-scale Long Short-range Decorrelation Module (MLSDM)-based context codec is proposed, in which an MLSDM is designed to learn long short-range spatial-temporal dependencies and channel-wise correlations across varying receptive domains. Extensive experimental results show that the proposed method significantly outperforms VTM 17.0 and other state-of-the-art learned video compression methods in terms of both PSNR and MS-SSIM.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"808-820"},"PeriodicalIF":4.8000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11090160/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The Conditional Coding-based Learned Video Compression (CC-LVC) has become an important paradigm in learned video compression, because it can effectively explore spatial-temporal redundancies within a huge context space. However, existing CC-LVC methods cannot accurately model motion information and efficiently mine contextual correlations for complex regions with non-rigid motions and non-linear deformations. To address these problems, an efficient CC-LVC method is proposed in this paper, which mines spatial-temporal dependencies across multiple motion domains and receptive domains for improving the video coding efficiency. To accurately model complex motions and generate precise temporal contexts, a Multi-domain Motion modeling Network (MMNet) is proposed to capture robust motion information from both spatial and frequency domains. Moreover, a multi-domain context refinement module is developed to discriminatively highlight frequency-domain temporal contexts and adaptively fuse multi-domain temporal contexts, which can effectively mitigate inaccuracies in temporal contexts caused by motion errors. In order to efficiently compress video signals, a Multi-scale Long Short-range Decorrelation Module (MLSDM)-based context codec is proposed, in which an MLSDM is designed to learn long short-range spatial-temporal dependencies and channel-wise correlations across varying receptive domains. Extensive experimental results show that the proposed method significantly outperforms VTM 17.0 and other state-of-the-art learned video compression methods in terms of both PSNR and MS-SSIM.
基于多域时空冗余挖掘的高效学习视频压缩
基于条件编码的学习视频压缩(CC-LVC)由于能够在巨大的上下文空间内有效地探索时空冗余,已成为学习视频压缩的重要范式。然而,对于具有非刚性运动和非线性变形的复杂区域,现有的CC-LVC方法无法准确地建模运动信息并有效地挖掘上下文相关性。为了解决这些问题,本文提出了一种高效的CC-LVC方法,该方法在多个运动域和接受域之间挖掘时空依赖关系,以提高视频编码效率。为了准确地建模复杂运动并生成精确的时间背景,提出了一种多域运动建模网络(MMNet),从空间和频域捕获鲁棒的运动信息。此外,开发了多域上下文细化模块,对频域时间上下文进行判别突出,并自适应融合多域时间上下文,有效缓解运动误差引起的时间上下文不准确性。为了有效地压缩视频信号,提出了一种基于多尺度长短距离去相关模块(MLSDM)的上下文编解码器,其中MLSDM设计用于学习长短距离时空依赖性和不同接收域的信道相关。大量的实验结果表明,该方法在PSNR和MS-SSIM方面都明显优于VTM 17.0和其他最先进的学习视频压缩方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Broadcasting
IEEE Transactions on Broadcasting 工程技术-电信学
CiteScore
9.40
自引率
31.10%
发文量
79
审稿时长
6-12 weeks
期刊介绍: The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信