{"title":"Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN","authors":"Dongjian Yang, Xiaopeng Fan, Xiandong Meng, Debin Zhao","doi":"10.1007/s00530-024-01405-w","DOIUrl":null,"url":null,"abstract":"<p>Recently, learned image compression (LIC) has shown significant research potential. Most existing LIC methods are CNN-based or transformer-based or mixed. However, these LIC methods suffer from a certain degree of degradation in global attention performance, as CNN has limited-sized convolution kernels while window partitioning is applied to reduce computational complexity in transformer. This gives rise to the following two issues: (1) The main autoencoder (AE) and hyper AE exhibit limited transformation capabilities due to insufficient global modeling, making it challenging to improve the accuracy of coarse-grained entropy model. (2) The fine-grained entropy model struggles to adaptively utilize a larger range of contexts, because of weaker global modeling capability. In this paper, we propose the LIC with joint enhanced swin transformer (SwinT) and CNN to improve the entropy modeling accuracy. The key in the proposed method is that we enhance the global modeling ability of SwinT by introducing neighborhood window attention while maintaining an acceptable computational complexity and combines the local modeling ability of CNN to form the enhanced SwinT and CNN block (ESTCB). Specifically, we reconstruct the main AE and hyper AE of LIC based on ESTCB, enhancing their global transformation capabilities and resulting in a more accurate coarse-grained entropy model. Besides, we combine ESTCB with the checkerboard mask and the channel autoregressive model to develop a spatial then channel fine-grained entropy model, expanding the scope of LIC adaptive reference contexts. Comprehensive experiments demonstrate that our proposed method achieves state-of-the-art rate-distortion performance compared to existing LIC models.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01405-w","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, learned image compression (LIC) has shown significant research potential. Most existing LIC methods are CNN-based or transformer-based or mixed. However, these LIC methods suffer from a certain degree of degradation in global attention performance, as CNN has limited-sized convolution kernels while window partitioning is applied to reduce computational complexity in transformer. This gives rise to the following two issues: (1) The main autoencoder (AE) and hyper AE exhibit limited transformation capabilities due to insufficient global modeling, making it challenging to improve the accuracy of coarse-grained entropy model. (2) The fine-grained entropy model struggles to adaptively utilize a larger range of contexts, because of weaker global modeling capability. In this paper, we propose the LIC with joint enhanced swin transformer (SwinT) and CNN to improve the entropy modeling accuracy. The key in the proposed method is that we enhance the global modeling ability of SwinT by introducing neighborhood window attention while maintaining an acceptable computational complexity and combines the local modeling ability of CNN to form the enhanced SwinT and CNN block (ESTCB). Specifically, we reconstruct the main AE and hyper AE of LIC based on ESTCB, enhancing their global transformation capabilities and resulting in a more accurate coarse-grained entropy model. Besides, we combine ESTCB with the checkerboard mask and the channel autoregressive model to develop a spatial then channel fine-grained entropy model, expanding the scope of LIC adaptive reference contexts. Comprehensive experiments demonstrate that our proposed method achieves state-of-the-art rate-distortion performance compared to existing LIC models.