{"title":"利用通道空间非线性变换探索高维特征空间用于学习图像压缩","authors":"Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang","doi":"10.1049/cit2.70025","DOIUrl":null,"url":null,"abstract":"<p>Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1235-1253"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70025","citationCount":"0","resultStr":"{\"title\":\"Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression\",\"authors\":\"Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang\",\"doi\":\"10.1049/cit2.70025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 4\",\"pages\":\"1235-1253\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70025\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70025\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70025","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression
Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.