面向学习全向图像压缩的自适应纬度感知和重要激活变换编码

IF 4.8 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting Pub Date : 2025-03-15 DOI:10.1109/TBC.2025.3565895

Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin

{"title":"面向学习全向图像压缩的自适应纬度感知和重要激活变换编码","authors":"Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin","doi":"10.1109/TBC.2025.3565895","DOIUrl":null,"url":null,"abstract":"Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"874-888"},"PeriodicalIF":4.8000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Latitude-Aware and Importance-Activated Transform Coding for Learned Omnidirectional Image Compression\",\"authors\":\"Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin\",\"doi\":\"10.1109/TBC.2025.3565895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.\",\"PeriodicalId\":13159,\"journal\":{\"name\":\"IEEE Transactions on Broadcasting\",\"volume\":\"71 3\",\"pages\":\"874-888\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Broadcasting\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11005398/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11005398/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

根据测量的纬度和经度，用户可以自由地查看全方位图像的不同视角。通常，全向图像以等矩形投影（ERP）格式表示。尽管ERP图像由于过采样而存在失真和冗余，使传统的编解码器效率低下，但它们保持了视觉一致性，并增强了与基于深度学习的图像处理工具的兼容性。这导致了基于ERP格式的端到端全方位图像压缩方法的出现。事实上，作为学习平面图像压缩的关键组成部分，变换编码在学习全向图像压缩领域还没有得到充分的研究。本文提出了一种具有自适应纬度感知和重要性激活特征的全向图像压缩变换编码方法。具体来说，自适应纬度感知机制包括两个模块。第一个模块被称为自适应纬度感知模块（ALAM），它采用多种尺寸的矩形扩展卷积核来感知不同纬度的失真冗余，然后通过纬度自适应加权来选择相应纬度的最优特征。第二个模块称为多尺度卷积门控前馈网络（MCGFN），它充分利用了局部上下文信息，同时抑制了第一个模块中由各种扩展卷积引起的特征冗余。此外，为了进一步减少ERP冗余，我们设计了一个重要激活的空间特征转换模块，该模块调节潜在表征，将更多的比特分配到重要区域。实验结果表明，该方法在保持较低的计算复杂度的同时，在中高比特率下优于现有的VVC标准和基于学习的全向图像压缩方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Latitude-Aware and Importance-Activated Transform Coding for Learned Omnidirectional Image Compression

Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Broadcasting 工程技术-电信学

CiteScore

9.40

自引率

31.10%

发文量

审稿时长

6-12 weeks

期刊介绍： The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”