{"title":"Multi-Scale Feature Guided Transformer for Image Inpainting","authors":"Zeji Huang, Huanda Lu, Xin Yu, Hui Xiao","doi":"10.1049/ipr2.70105","DOIUrl":null,"url":null,"abstract":"<p>In recent years, image restoration has witnessed remarkable advancements. However, reconstructing visually plausible textures while preserving global structural coherence remains a persistent challenge. Existing convolutional neural network (CNN)-based approaches are inherently limited by their local receptive fields, often struggling to capture global structure. Previously proposed methods mostly focus on structural priors to address the limitation of CNN's receptive field, but we believe that texture priors are also critical factors that influence the quality of image inpainting. To tackle semantic inconsistency and texture blurriness in current methods, we introduce a novel multi-stage restoration framework. Specifically, our architecture incorporates a dual-stream U-Net with attention mechanisms to extract multi-scale features. The mixed attention-gated feature fusion module exchanges and combines structure and texture features to generate multi-scale fused feature maps, which are progressively merged into the decoder to guide the Transformer to generate more realistic images. Additionally, we propose a feature selection feedforward network to replace traditional MLPs in Transformer blocks for adaptive feature refinement. Extensive experiments on CelebA-HQ and Paris StreetView datasets demonstrate superior performance both qualitatively and quantitatively compared to state-of-the-art methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70105","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70105","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, image restoration has witnessed remarkable advancements. However, reconstructing visually plausible textures while preserving global structural coherence remains a persistent challenge. Existing convolutional neural network (CNN)-based approaches are inherently limited by their local receptive fields, often struggling to capture global structure. Previously proposed methods mostly focus on structural priors to address the limitation of CNN's receptive field, but we believe that texture priors are also critical factors that influence the quality of image inpainting. To tackle semantic inconsistency and texture blurriness in current methods, we introduce a novel multi-stage restoration framework. Specifically, our architecture incorporates a dual-stream U-Net with attention mechanisms to extract multi-scale features. The mixed attention-gated feature fusion module exchanges and combines structure and texture features to generate multi-scale fused feature maps, which are progressively merged into the decoder to guide the Transformer to generate more realistic images. Additionally, we propose a feature selection feedforward network to replace traditional MLPs in Transformer blocks for adaptive feature refinement. Extensive experiments on CelebA-HQ and Paris StreetView datasets demonstrate superior performance both qualitatively and quantitatively compared to state-of-the-art methods.
期刊介绍:
The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications.
Principal topics include:
Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality.
Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing.
Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing.
Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video.
Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography.
Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security.
Current Special Issue Call for Papers:
Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf
AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf
Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf
Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf