Multi-Prior Driven Resolution Rescaling Blocks for Intra Frame Coding

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-09-02 DOI:10.1109/TMM.2024.3453033

Peiying Wu;Shiwei Wang;Liquan Shen;Feifeng Wang;Zhaoyi Tian;Xia Hua

{"title":"Multi-Prior Driven Resolution Rescaling Blocks for Intra Frame Coding","authors":"Peiying Wu;Shiwei Wang;Liquan Shen;Feifeng Wang;Zhaoyi Tian;Xia Hua","doi":"10.1109/TMM.2024.3453033","DOIUrl":null,"url":null,"abstract":"Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sources of information, ignoring potential interactions between multiple priors in rescaling, which may not effectively facilitate compression; 2) they often employ a uniform sampling ratio across regions with varying content complexities, resulting in the loss of important information. To address the above two issues, this paper proposes a spatial multi-prior driven resolution rescaling framework for intra-frame coding, called MP-RRF, consisting of three sub-networks: a multi-prior driven network, a downscaling network, and an upscaling network. First, the multi-prior driven network employs complexity and similarity priors to smooth the unnecessarily complicated information while leveraging similarity and quality priors to produce high-fidelity complementary information. This interaction of complexity, similarity and quality priors ensures redundancy reduction and texture enhancement. Second, the downscaling network discriminatively processes components of different granularities to generate a compact, low-resolution image for encoding. The upscaling network aggregates a complementary set of contextual multi-scale features to reconstruct realistic details while combining variable receptive fields to suppress multi-scale compression artifacts and resampling noise. Extensive experiments show that our network achieves a significant 23.84% Bjøntegaard Delta Rate (BD-Rate) reduction under all-intra configuration compared to the codec anchor, offering the state-of-the-art coding performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11274-11289"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663239/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning techniques are increasingly integrated into rescaling-based video compression frameworks and have shown great potential in improving compression efficiency. However, existing methods achieve limited performance because 1) they treat context priors generated by codec as independent sources of information, ignoring potential interactions between multiple priors in rescaling, which may not effectively facilitate compression; 2) they often employ a uniform sampling ratio across regions with varying content complexities, resulting in the loss of important information. To address the above two issues, this paper proposes a spatial multi-prior driven resolution rescaling framework for intra-frame coding, called MP-RRF, consisting of three sub-networks: a multi-prior driven network, a downscaling network, and an upscaling network. First, the multi-prior driven network employs complexity and similarity priors to smooth the unnecessarily complicated information while leveraging similarity and quality priors to produce high-fidelity complementary information. This interaction of complexity, similarity and quality priors ensures redundancy reduction and texture enhancement. Second, the downscaling network discriminatively processes components of different granularities to generate a compact, low-resolution image for encoding. The upscaling network aggregates a complementary set of contextual multi-scale features to reconstruct realistic details while combining variable receptive fields to suppress multi-scale compression artifacts and resampling noise. Extensive experiments show that our network achieves a significant 23.84% Bjøntegaard Delta Rate (BD-Rate) reduction under all-intra configuration compared to the codec anchor, offering the state-of-the-art coding performance.

查看原文本刊更多论文

用于帧内编码的多优先级驱动分辨率重缩块

深度学习技术越来越多地被集成到基于重缩放的视频压缩框架中，并在提高压缩效率方面显示出巨大潜力。然而，现有方法的性能有限，原因在于：1）它们将编解码器生成的上下文先验视为独立的信息源，忽略了重缩放过程中多个先验之间潜在的相互作用，可能无法有效促进压缩；2）它们通常在内容复杂度不同的区域采用统一的采样率，导致重要信息丢失。为解决上述两个问题，本文提出了一种用于帧内编码的空间多前验驱动分辨率重缩放框架，称为 MP-RRF，由三个子网络组成：多前验驱动网络、缩放网络和提升网络。首先，多先验驱动网络利用复杂性和相似性先验来平滑不必要的复杂信息，同时利用相似性和质量先验来产生高保真互补信息。这种复杂性、相似性和质量先验的相互作用确保了冗余的减少和纹理的增强。其次，降维网络对不同粒度的成分进行鉴别处理，生成紧凑、低分辨率的图像进行编码。升频网络汇聚了一组互补的上下文多尺度特征，以重建逼真的细节，同时结合可变感受野以抑制多尺度压缩伪影和重采样噪声。广泛的实验表明，与编解码器锚点相比，我们的网络在全内配置下显著降低了 23.84% 的比昂特加德Δ率（BD-Rate），提供了最先进的编码性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.