Multi-Scale Feature Guided Transformer for Image Inpainting

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-05-22 DOI:10.1049/ipr2.70105

Zeji Huang, Huanda Lu, Xin Yu, Hui Xiao

{"title":"Multi-Scale Feature Guided Transformer for Image Inpainting","authors":"Zeji Huang, Huanda Lu, Xin Yu, Hui Xiao","doi":"10.1049/ipr2.70105","DOIUrl":null,"url":null,"abstract":"<p>In recent years, image restoration has witnessed remarkable advancements. However, reconstructing visually plausible textures while preserving global structural coherence remains a persistent challenge. Existing convolutional neural network (CNN)-based approaches are inherently limited by their local receptive fields, often struggling to capture global structure. Previously proposed methods mostly focus on structural priors to address the limitation of CNN's receptive field, but we believe that texture priors are also critical factors that influence the quality of image inpainting. To tackle semantic inconsistency and texture blurriness in current methods, we introduce a novel multi-stage restoration framework. Specifically, our architecture incorporates a dual-stream U-Net with attention mechanisms to extract multi-scale features. The mixed attention-gated feature fusion module exchanges and combines structure and texture features to generate multi-scale fused feature maps, which are progressively merged into the decoder to guide the Transformer to generate more realistic images. Additionally, we propose a feature selection feedforward network to replace traditional MLPs in Transformer blocks for adaptive feature refinement. Extensive experiments on CelebA-HQ and Paris StreetView datasets demonstrate superior performance both qualitatively and quantitatively compared to state-of-the-art methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70105","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70105","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, image restoration has witnessed remarkable advancements. However, reconstructing visually plausible textures while preserving global structural coherence remains a persistent challenge. Existing convolutional neural network (CNN)-based approaches are inherently limited by their local receptive fields, often struggling to capture global structure. Previously proposed methods mostly focus on structural priors to address the limitation of CNN's receptive field, but we believe that texture priors are also critical factors that influence the quality of image inpainting. To tackle semantic inconsistency and texture blurriness in current methods, we introduce a novel multi-stage restoration framework. Specifically, our architecture incorporates a dual-stream U-Net with attention mechanisms to extract multi-scale features. The mixed attention-gated feature fusion module exchanges and combines structure and texture features to generate multi-scale fused feature maps, which are progressively merged into the decoder to guide the Transformer to generate more realistic images. Additionally, we propose a feature selection feedforward network to replace traditional MLPs in Transformer blocks for adaptive feature refinement. Extensive experiments on CelebA-HQ and Paris StreetView datasets demonstrate superior performance both qualitatively and quantitatively compared to state-of-the-art methods.

查看原文本刊更多论文

多尺度特征导向图像补漆变压器

近年来，图像修复技术取得了显著的进步。然而，在保持整体结构一致性的同时重建视觉上合理的纹理仍然是一个持续的挑战。现有的基于卷积神经网络（CNN）的方法固有地受到其局部接受域的限制，通常难以捕捉全局结构。先前提出的方法主要集中在结构先验上，以解决CNN接受域的局限性，但我们认为纹理先验也是影响图像粉刷质量的关键因素。为了解决当前复原方法中语义不一致和纹理模糊的问题，我们引入了一种新的多阶段复原框架。具体来说，我们的架构结合了双流U-Net和注意力机制来提取多尺度特征。混合注意门控特征融合模块交换组合结构特征和纹理特征，生成多尺度融合特征图，逐步融合到解码器中，引导Transformer生成更逼真的图像。此外，我们提出了一个特征选择前馈网络来取代Transformer块中的传统mlp进行自适应特征细化。在CelebA-HQ和巴黎街景数据集上进行的大量实验表明，与最先进的方法相比，该方法在定性和定量方面都具有优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf