AFAN: An Attention-Driven Forgery Adversarial Network for Blind Image Inpainting

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-07-22 DOI:10.1109/TMM.2025.3590914

Jiahao Wang;Gang Pan;Di Sun;Jinyuan Li;Jiawan Zhang

{"title":"AFAN: An Attention-Driven Forgery Adversarial Network for Blind Image Inpainting","authors":"Jiahao Wang;Gang Pan;Di Sun;Jinyuan Li;Jiawan Zhang","doi":"10.1109/TMM.2025.3590914","DOIUrl":null,"url":null,"abstract":"Blind image inpainting is a challenging task aimed at reconstructing corrupted regions without relying on mask information. Due to the lack of mask priors, previous methods usually integrate a mask prediction network in the initial phase, followed by an inpainting backbone. However, this multi-stage generation process may result in feature misalignment. While recent end-to-end generative methods bypass the mask prediction step, they typically struggle with weak perception of contaminated regions and introduce structural distortions. This study presents a novel mask region perception strategy for blind image inpainting by combining adversarial training with forgery detection. To implement this strategy, we propose an attention-driven forgery adversarial network (AFAN), which leverages adaptive contextual attention (ACA) blocks for effective feature modulation. Specifically, within the generator, ACA employs self-attention to enhance content reconstruction by utilizing the rich contextual information of adjacent tokens. In the discriminator, ACA utilizes cross-attention with noise priors to guide adversarial learning for forgery detection. Moreover, we design a high-frequency omni-dimensional dynamic convolution (HODC) based on edge feature enhancement to improve detail representation. Extensive evaluations across multiple datasets demonstrate that the proposed AFAN model outperforms existing generative methods in blind image inpainting, particularly in terms of quality and texture fidelity.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6845-6856"},"PeriodicalIF":9.7000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11086395/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Blind image inpainting is a challenging task aimed at reconstructing corrupted regions without relying on mask information. Due to the lack of mask priors, previous methods usually integrate a mask prediction network in the initial phase, followed by an inpainting backbone. However, this multi-stage generation process may result in feature misalignment. While recent end-to-end generative methods bypass the mask prediction step, they typically struggle with weak perception of contaminated regions and introduce structural distortions. This study presents a novel mask region perception strategy for blind image inpainting by combining adversarial training with forgery detection. To implement this strategy, we propose an attention-driven forgery adversarial network (AFAN), which leverages adaptive contextual attention (ACA) blocks for effective feature modulation. Specifically, within the generator, ACA employs self-attention to enhance content reconstruction by utilizing the rich contextual information of adjacent tokens. In the discriminator, ACA utilizes cross-attention with noise priors to guide adversarial learning for forgery detection. Moreover, we design a high-frequency omni-dimensional dynamic convolution (HODC) based on edge feature enhancement to improve detail representation. Extensive evaluations across multiple datasets demonstrate that the proposed AFAN model outperforms existing generative methods in blind image inpainting, particularly in terms of quality and texture fidelity.

查看原文本刊更多论文

AFAN：一种用于盲图像绘制的注意力驱动的伪造对抗网络

盲图像重建是一项具有挑战性的任务，其目的是在不依赖掩模信息的情况下重建损坏区域。由于缺乏掩码先验，以前的方法通常在初始阶段集成一个掩码预测网络，然后再集成一个修复主干。然而，这种多阶段生成过程可能导致特征不对齐。虽然最近的端到端生成方法绕过了掩膜预测步骤，但它们通常难以对污染区域进行弱感知，并引入结构扭曲。本文提出了一种将对抗训练与伪造检测相结合的盲图像补漆掩膜区域感知策略。为了实现这一策略，我们提出了一个注意驱动的伪造对抗网络（AFAN），它利用自适应上下文注意（ACA）块进行有效的特征调制。具体而言，在生成器中，ACA利用相邻令牌的丰富上下文信息，利用自关注来增强内容重构。在鉴别器中，ACA利用交叉注意和噪声先验来指导伪造检测的对抗学习。此外，我们还设计了一种基于边缘特征增强的高频全维动态卷积（HODC）算法来改善细节表示。跨多个数据集的广泛评估表明，所提出的AFAN模型在盲图像绘制方面优于现有的生成方法，特别是在质量和纹理保真度方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.