Improved fine-tuning of mask-aware transformer for personalized face inpainting with semantic-aware regularization

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yuan Zeng , Yijing Sun , Yi Gong
{"title":"Improved fine-tuning of mask-aware transformer for personalized face inpainting with semantic-aware regularization","authors":"Yuan Zeng ,&nbsp;Yijing Sun ,&nbsp;Yi Gong","doi":"10.1016/j.patrec.2025.07.009","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in generative models have led to significant improvements in the challenging task of high-fidelity image inpainting. How to effectively guide or control these powerful models to perform personalized tasks becomes an important open problem. In this letter, we introduce a semantic-aware fine-tuning method for adapting a pre-trained image inpainting model, mask-aware transformer (MAT), to personalized face inpainting. Unlike existing methods, which tune a personalized generative prior with multiple reference images, our method can recover the key facial features of the individual with only few input references. To improve the fine-tuning stability in a setting with few reference images, we propose a multiscale semantic-aware regularization to encourage the generated key facial components to match those in the reference. Specifically, we generate a mask to extract the key facial components as prior knowledge and impose a semantic-based regularization on these regions at multiple scales, with which the fidelity and identity preservation of facial components are significantly promoted. Extensive experiments demonstrate that our method can generate high-fidelity personalized face inpainting results using only three reference images, which is much fewer than personalized inpainting baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 95-101"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002612","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in generative models have led to significant improvements in the challenging task of high-fidelity image inpainting. How to effectively guide or control these powerful models to perform personalized tasks becomes an important open problem. In this letter, we introduce a semantic-aware fine-tuning method for adapting a pre-trained image inpainting model, mask-aware transformer (MAT), to personalized face inpainting. Unlike existing methods, which tune a personalized generative prior with multiple reference images, our method can recover the key facial features of the individual with only few input references. To improve the fine-tuning stability in a setting with few reference images, we propose a multiscale semantic-aware regularization to encourage the generated key facial components to match those in the reference. Specifically, we generate a mask to extract the key facial components as prior knowledge and impose a semantic-based regularization on these regions at multiple scales, with which the fidelity and identity preservation of facial components are significantly promoted. Extensive experiments demonstrate that our method can generate high-fidelity personalized face inpainting results using only three reference images, which is much fewer than personalized inpainting baselines.
改进了基于语义感知正则化的面具感知变压器的个性化人脸绘制微调
生成模型的最新进展导致了高保真图像绘制这一具有挑战性的任务的重大改进。如何有效地引导或控制这些强大的模型来执行个性化任务成为一个重要的开放性问题。在这封信中,我们介绍了一种语义感知的微调方法,用于使预训练的图像喷漆模型,面具感知变压器(MAT)适应个性化的面部喷漆。与现有方法不同的是,该方法使用多个参考图像来调整个性化的生成先验,而我们的方法只需要少量的输入参考就可以恢复个人的关键面部特征。为了提高在参考图像较少的情况下的微调稳定性,我们提出了一种多尺度语义感知正则化方法,以鼓励生成的关键面部成分与参考图像中的关键面部成分匹配。具体来说,我们生成一个掩模来提取关键的面部成分作为先验知识,并在多个尺度上对这些区域进行基于语义的正则化,从而显著提高了面部成分的保真度和身份保持。大量实验表明,我们的方法仅使用三张参考图像就可以生成高保真的个性化人脸绘制结果,这比个性化人脸绘制基线少得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信