Learning position-aware implicit neural network for real-world face inpainting

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-03-18 DOI:10.1016/j.patcog.2025.111598

Bo Zhao, Huan Yang, Jianlong Fu

{"title":"Learning position-aware implicit neural network for real-world face inpainting","authors":"Bo Zhao, Huan Yang, Jianlong Fu","doi":"10.1016/j.patcog.2025.111598","DOIUrl":null,"url":null,"abstract":"<div><div>Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with 512px). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an <strong>I</strong>mplicit <strong>N</strong>eural <strong>I</strong>npainting <strong>N</strong>etwork (IN<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the model’s facial understanding ability without restricting the input’s shape. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Our method achieves optimal facial image restoration performance on both the CelebA-HQ and LFW datasets, as well as downstream tasks of face verification, which introduces more efficient face inpainting algorithm to the fields of image editing software and intelligent security.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111598"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002584","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with 512px). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an Implicit Neural Inpainting Network (IN

^{2}

) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the model’s facial understanding ability without restricting the input’s shape. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Our method achieves optimal facial image restoration performance on both the CelebA-HQ and LFW datasets, as well as downstream tasks of face verification, which introduces more efficient face inpainting algorithm to the fields of image editing software and intelligent security.

Abstract Image

查看原文本刊更多论文

人脸绘制要求模型对面部位置结构有精确的全局理解。得益于深度学习骨干的强大能力，最近的人脸绘制工作在理想环境下（512px 的正方形）取得了不错的成绩。然而，当现有方法直接应用于现实世界中任意形状的图像时，往往会产生令人不悦的视觉效果，尤其是在位置敏感的细节（如眼睛和鼻子）上。这些视觉上令人不快的位置敏感细节表明，现有方法在位置信息处理能力方面存在缺陷。在本文中，我们提出了一种隐式神经绘制网络（IN2），通过明确的位置信息建模来处理真实世界场景中的任意形状人脸图像。具体来说，我们提出了一个下采样处理编码器，以减少信息损失，同时获得全局语义特征。此外，还提出了一个邻域混合注意力块，利用混合注意力机制，在不限制输入形状的情况下提高模型的面部理解能力。最后，我们还引入了隐式神经金字塔解码器，对位置信息进行显式建模，弥补了低分辨率特征与高分辨率输出之间的差距。我们的方法在 CelebA-HQ 和 LFW 数据集以及下游的人脸验证任务中都取得了最佳的人脸图像复原性能，为图像编辑软件和智能安防领域引入了更高效的人脸涂画算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.