Yuxiang Wu , Xiaoyan Wang , Xiaoyan Liu , Yuzhao Gao , Yan Dou
{"title":"Pixel integration from fine to coarse for lightweight image super-resolution","authors":"Yuxiang Wu , Xiaoyan Wang , Xiaoyan Liu , Yuzhao Gao , Yan Dou","doi":"10.1016/j.imavis.2024.105362","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Transformer-based methods have made significant progress on image super-resolution. They encode long-range dependencies between image patches through self-attention mechanism. However, when extracting all tokens from the entire feature map, the computational cost is expensive. In this paper, we propose a novel lightweight image super-resolution approach, pixel integration network(PIN). Specifically, our method employs fine pixel integration and coarse pixel integration from local and global receptive field. In particular, coarse pixel integration is implemented by a retractable attention, consisting of dense and sparse self-attention. In order to focus on enriching features with contextual information, spatial-gate mechanism and depth-wise convolution are introduced to multi-layer perception. Besides, a spatial frequency fusion block is adopted to obtain more comprehensive, detailed, and stable information at the end of deep feature extraction. Extensive experiments demonstrate that PIN achieves the state-of-the-art performance with small parameters on lightweight super-resolution.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105362"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004670","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, Transformer-based methods have made significant progress on image super-resolution. They encode long-range dependencies between image patches through self-attention mechanism. However, when extracting all tokens from the entire feature map, the computational cost is expensive. In this paper, we propose a novel lightweight image super-resolution approach, pixel integration network(PIN). Specifically, our method employs fine pixel integration and coarse pixel integration from local and global receptive field. In particular, coarse pixel integration is implemented by a retractable attention, consisting of dense and sparse self-attention. In order to focus on enriching features with contextual information, spatial-gate mechanism and depth-wise convolution are introduced to multi-layer perception. Besides, a spatial frequency fusion block is adopted to obtain more comprehensive, detailed, and stable information at the end of deep feature extraction. Extensive experiments demonstrate that PIN achieves the state-of-the-art performance with small parameters on lightweight super-resolution.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.