FaceChain-MMID:通过分割和合并多模态表示生成高度身份一致的现实肖像

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Chao Xu , Fei Wang , Cheng Yu , Baigui Sun , Jian Zhao
{"title":"FaceChain-MMID:通过分割和合并多模态表示生成高度身份一致的现实肖像","authors":"Chao Xu ,&nbsp;Fei Wang ,&nbsp;Cheng Yu ,&nbsp;Baigui Sun ,&nbsp;Jian Zhao","doi":"10.1016/j.patcog.2025.111858","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in text-to-image generation have made significant strides in customizing realistic human photos. Most of the existing methods focus on addressing efficiency issues to avoid resource-intensive and time-consuming subject-specific fine-tuning. However, they lack an in-depth exploration for identity preservation, thus suffering from significant degradation in real scenarios. We propose FaceChain-MMID in response to this challenge. First, we comprehensively represent facial identity using three factors: the face image to provide basic identity, the segmentation mask to refine the facial geometry, the text prompts to further supplement additional identity-related attributes. Building upon these multi-modal features, we propose a novel dividing and merging strategy to support highly identity-consistent personalized portrait generation. Specifically, the dividing stage ensures that each modality fully expresses its own information by training independent uni-modal conditional diffusion. The subsequent merging stage introduces an efficient modal-specific proxy module to sufficiently combine the noise from each branch at latent denoising steps, where incorporates identity-centric face pairs, a filtering mechanism, and truncated loss to enhance inter-modal complementarity. Extensive qualitative and quantitative experiments demonstrate the superior performance of our approach in preserving identity.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"168 ","pages":"Article 111858"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FaceChain-MMID: Generating highly identity-consistent realistic portraits via dividing & merging multi-modal representations\",\"authors\":\"Chao Xu ,&nbsp;Fei Wang ,&nbsp;Cheng Yu ,&nbsp;Baigui Sun ,&nbsp;Jian Zhao\",\"doi\":\"10.1016/j.patcog.2025.111858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advancements in text-to-image generation have made significant strides in customizing realistic human photos. Most of the existing methods focus on addressing efficiency issues to avoid resource-intensive and time-consuming subject-specific fine-tuning. However, they lack an in-depth exploration for identity preservation, thus suffering from significant degradation in real scenarios. We propose FaceChain-MMID in response to this challenge. First, we comprehensively represent facial identity using three factors: the face image to provide basic identity, the segmentation mask to refine the facial geometry, the text prompts to further supplement additional identity-related attributes. Building upon these multi-modal features, we propose a novel dividing and merging strategy to support highly identity-consistent personalized portrait generation. Specifically, the dividing stage ensures that each modality fully expresses its own information by training independent uni-modal conditional diffusion. The subsequent merging stage introduces an efficient modal-specific proxy module to sufficiently combine the noise from each branch at latent denoising steps, where incorporates identity-centric face pairs, a filtering mechanism, and truncated loss to enhance inter-modal complementarity. Extensive qualitative and quantitative experiments demonstrate the superior performance of our approach in preserving identity.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"168 \",\"pages\":\"Article 111858\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325005187\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325005187","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

文本到图像生成的最新进展在定制逼真的人体照片方面取得了重大进展。大多数现有方法侧重于解决效率问题,以避免资源密集型和耗时的特定主题微调。然而,它们缺乏对身份保存的深入探索,因此在真实场景中遭受了严重的退化。为了应对这一挑战,我们提出了FaceChain-MMID。首先,利用人脸图像提供基本身份信息,利用分割掩码细化人脸几何形状,利用文本提示进一步补充额外的身份相关属性,对人脸身份进行综合表征。基于这些多模态特征,我们提出了一种新的划分和合并策略,以支持高度身份一致的个性化肖像生成。具体来说,划分阶段通过训练独立的单模态条件扩散,确保每个模态充分表达自己的信息。随后的合并阶段引入了一个有效的特定于模态的代理模块,以在潜在去噪步骤中充分结合来自每个分支的噪声,其中包含以身份为中心的人脸对、过滤机制和截断损失,以增强模态间的互补性。大量的定性和定量实验证明了我们的方法在保持身份方面的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FaceChain-MMID: Generating highly identity-consistent realistic portraits via dividing & merging multi-modal representations
Recent advancements in text-to-image generation have made significant strides in customizing realistic human photos. Most of the existing methods focus on addressing efficiency issues to avoid resource-intensive and time-consuming subject-specific fine-tuning. However, they lack an in-depth exploration for identity preservation, thus suffering from significant degradation in real scenarios. We propose FaceChain-MMID in response to this challenge. First, we comprehensively represent facial identity using three factors: the face image to provide basic identity, the segmentation mask to refine the facial geometry, the text prompts to further supplement additional identity-related attributes. Building upon these multi-modal features, we propose a novel dividing and merging strategy to support highly identity-consistent personalized portrait generation. Specifically, the dividing stage ensures that each modality fully expresses its own information by training independent uni-modal conditional diffusion. The subsequent merging stage introduces an efficient modal-specific proxy module to sufficiently combine the noise from each branch at latent denoising steps, where incorporates identity-centric face pairs, a filtering mechanism, and truncated loss to enhance inter-modal complementarity. Extensive qualitative and quantitative experiments demonstrate the superior performance of our approach in preserving identity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信