利用通道空间非线性变换探索高维特征空间用于学习图像压缩

IF 7.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2025-06-10 DOI:10.1049/cit2.70025

Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang

{"title":"利用通道空间非线性变换探索高维特征空间用于学习图像压缩","authors":"Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang","doi":"10.1049/cit2.70025","DOIUrl":null,"url":null,"abstract":"<p>Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1235-1253"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70025","citationCount":"0","resultStr":"{\"title\":\"Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression\",\"authors\":\"Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang\",\"doi\":\"10.1049/cit2.70025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 4\",\"pages\":\"1235-1253\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70025\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70025\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70025","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

非线性变换极大地提高了学习图像压缩（LIC），特别是使用残差块。该变换增强了非线性表达能力，通过扩大接收野获得紧凑的特征表示，表明卷积过程是如何在高维特征空间中提取特征的。但其功能受限于空间维度和网络深度，由于信息交互和表征不足，限制了网络性能的进一步提高。至关重要的是，通道维度上高维特征空间的潜力和网络宽度/分辨率的探索在很大程度上仍未得到开发。本文从特征空间的角度考虑非线性变换，定义不同维度的高维特征空间，并研究其具体效果。首先，在通道维度和空间维度上引入增维和降维变换，获得高维特征空间，实现更好的特征提取；其次，我们设计了信道空间融合残差变换（CSR），该变换结合了多维变换以获得更有效的表示。此外，我们简化了所提出的融合变换，以获得一个精简架构（CSR-sm），平衡了网络复杂性和压缩性能。最后，我们利用堆叠CSR变换构建整体网络，以达到更好的压缩和重构。实验结果表明，与现有的LIC方法和传统编解码器相比，该方法具有更好的率失真性能。具体来说，我们提出的方法在柯达数据集上比VVC降低了9.38%的bd率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression

查看原文本刊更多论文

Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression

Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.