面向图像超分辨率的语义驱动全局-局部融合变压器。

IF 13.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Image Processing Pub Date : 2025-09-18 DOI:10.1109/tip.2025.3609106

Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao

{"title":"面向图像超分辨率的语义驱动全局-局部融合变压器。","authors":"Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao","doi":"10.1109/tip.2025.3609106","DOIUrl":null,"url":null,"abstract":"Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"22 1","pages":""},"PeriodicalIF":13.7000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution.\",\"authors\":\"Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao\",\"doi\":\"10.1109/tip.2025.3609106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.\",\"PeriodicalId\":13217,\"journal\":{\"name\":\"IEEE Transactions on Image Processing\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tip.2025.3609106\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tip.2025.3609106","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着基于变压器的架构的出现，图像超分辨率（SR）已经取得了显著的进步。然而，由于计算成本高，许多现有的基于变压器的SR方法限制了它们对局部窗口的关注，这阻碍了它们对长期依赖关系和全局结构的建模能力。为了解决这些挑战，我们提出了一种新的SR框架，称为语义驱动的全局-局部融合变压器（SGLFT）。该模型通过结合混合窗口变换（HWT）和可扩展变换模块（STM）来增强接收场，以共同捕获局部纹理和全局上下文。为了进一步加强重建的语义一致性，我们引入了一个语义提取模块（SEM），从输入中提取高级语义先验。这些语义线索通过自适应特征融合语义集成模块（AFFSIM）与视觉特征自适应集成。在标准基准上的大量实验证明了SGLFT在产生视觉上忠实和结构上一致的SR结果方面的有效性。代码可在https://github.com/kbzhang0505/SGLFT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution.

Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Image Processing 工程技术-工程：电子与电气

CiteScore

20.90

自引率

6.60%

发文量

774

审稿时长

7.6 months

期刊介绍： The IEEE Transactions on Image Processing delves into groundbreaking theories, algorithms, and structures concerning the generation, acquisition, manipulation, transmission, scrutiny, and presentation of images, video, and multidimensional signals across diverse applications. Topics span mathematical, statistical, and perceptual aspects, encompassing modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Pertinent applications range from image and video communications to electronic imaging, biomedical imaging, image and video systems, and remote sensing.