{"title":"Spatial and frequency information fusion transformer for image super-resolution","authors":"Yan Zhang, Fujie Xu, Yemei Sun, Jiao Wang","doi":"10.1016/j.neunet.2025.107351","DOIUrl":null,"url":null,"abstract":"<div><div>Previous works have indicated that Transformer-based models bring impressive image reconstruction performance in single image super-resolution (SISR). However, existing Transformer-based approaches utilize self-attention within non-overlapping windows. This restriction hinders the network’s ability to adopt large receptive fields, which are essential for capturing global information and establishing long-distance dependencies, especially in the early layers. To fully leverage global information and activate more pixels during the image reconstruction process, we have developed a Spatial and Frequency Information Fusion Transformer (SFFT) with an expansive receptive field. SFFT concurrently combines spatial and frequency domain information to comprehensively leverage their complementary strengths, capturing both local and global image features while integrating low and high-frequency information. Additionally, we utilize the overlapping cross-attention block (OCAB) to facilitate pixel transmission between adjacent windows, enhancing network performance. During the training stage, we incorporate the Fast Fourier Transform (FFT) loss, thereby fully leveraging the capabilities of our proposed modules and further tapping into the model’s potential. Extensive quantitative and qualitative evaluations on benchmark datasets indicate that the proposed algorithm surpasses state-of-the-art methods in terms of accuracy. Specifically, our method achieves a PSNR score of 32.67 dB on the Manga109 dataset, surpassing SwinIR by 0.64 dB and HAT by 0.19 dB, respectively. The source code and pre-trained models are available at <span><span>https://github.com/Xufujie/SFFT</span><svg><path></path></svg></span></div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107351"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002308","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Previous works have indicated that Transformer-based models bring impressive image reconstruction performance in single image super-resolution (SISR). However, existing Transformer-based approaches utilize self-attention within non-overlapping windows. This restriction hinders the network’s ability to adopt large receptive fields, which are essential for capturing global information and establishing long-distance dependencies, especially in the early layers. To fully leverage global information and activate more pixels during the image reconstruction process, we have developed a Spatial and Frequency Information Fusion Transformer (SFFT) with an expansive receptive field. SFFT concurrently combines spatial and frequency domain information to comprehensively leverage their complementary strengths, capturing both local and global image features while integrating low and high-frequency information. Additionally, we utilize the overlapping cross-attention block (OCAB) to facilitate pixel transmission between adjacent windows, enhancing network performance. During the training stage, we incorporate the Fast Fourier Transform (FFT) loss, thereby fully leveraging the capabilities of our proposed modules and further tapping into the model’s potential. Extensive quantitative and qualitative evaluations on benchmark datasets indicate that the proposed algorithm surpasses state-of-the-art methods in terms of accuracy. Specifically, our method achieves a PSNR score of 32.67 dB on the Manga109 dataset, surpassing SwinIR by 0.64 dB and HAT by 0.19 dB, respectively. The source code and pre-trained models are available at https://github.com/Xufujie/SFFT
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.