Feng Yuan;Zhaoqing Pan;Jianjun Lei;Bo Peng;Fu Lee Wang;Sam Kwong
{"title":"Multi-Domain Spatial-Temporal Redundancy Mining for Efficient Learned Video Compression","authors":"Feng Yuan;Zhaoqing Pan;Jianjun Lei;Bo Peng;Fu Lee Wang;Sam Kwong","doi":"10.1109/TBC.2025.3587532","DOIUrl":null,"url":null,"abstract":"The Conditional Coding-based Learned Video Compression (CC-LVC) has become an important paradigm in learned video compression, because it can effectively explore spatial-temporal redundancies within a huge context space. However, existing CC-LVC methods cannot accurately model motion information and efficiently mine contextual correlations for complex regions with non-rigid motions and non-linear deformations. To address these problems, an efficient CC-LVC method is proposed in this paper, which mines spatial-temporal dependencies across multiple motion domains and receptive domains for improving the video coding efficiency. To accurately model complex motions and generate precise temporal contexts, a Multi-domain Motion modeling Network (MMNet) is proposed to capture robust motion information from both spatial and frequency domains. Moreover, a multi-domain context refinement module is developed to discriminatively highlight frequency-domain temporal contexts and adaptively fuse multi-domain temporal contexts, which can effectively mitigate inaccuracies in temporal contexts caused by motion errors. In order to efficiently compress video signals, a Multi-scale Long Short-range Decorrelation Module (MLSDM)-based context codec is proposed, in which an MLSDM is designed to learn long short-range spatial-temporal dependencies and channel-wise correlations across varying receptive domains. Extensive experimental results show that the proposed method significantly outperforms VTM 17.0 and other state-of-the-art learned video compression methods in terms of both PSNR and MS-SSIM.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"808-820"},"PeriodicalIF":4.8000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11090160/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The Conditional Coding-based Learned Video Compression (CC-LVC) has become an important paradigm in learned video compression, because it can effectively explore spatial-temporal redundancies within a huge context space. However, existing CC-LVC methods cannot accurately model motion information and efficiently mine contextual correlations for complex regions with non-rigid motions and non-linear deformations. To address these problems, an efficient CC-LVC method is proposed in this paper, which mines spatial-temporal dependencies across multiple motion domains and receptive domains for improving the video coding efficiency. To accurately model complex motions and generate precise temporal contexts, a Multi-domain Motion modeling Network (MMNet) is proposed to capture robust motion information from both spatial and frequency domains. Moreover, a multi-domain context refinement module is developed to discriminatively highlight frequency-domain temporal contexts and adaptively fuse multi-domain temporal contexts, which can effectively mitigate inaccuracies in temporal contexts caused by motion errors. In order to efficiently compress video signals, a Multi-scale Long Short-range Decorrelation Module (MLSDM)-based context codec is proposed, in which an MLSDM is designed to learn long short-range spatial-temporal dependencies and channel-wise correlations across varying receptive domains. Extensive experimental results show that the proposed method significantly outperforms VTM 17.0 and other state-of-the-art learned video compression methods in terms of both PSNR and MS-SSIM.
期刊介绍:
The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”