parts2整体:可通用的多部分肖像定制。

IF 13.7
Hongxing Fan;Zehuan Huang;Lipeng Wang;Haohua Chen;Li Yin;Lu Sheng
{"title":"parts2整体:可通用的多部分肖像定制。","authors":"Hongxing Fan;Zehuan Huang;Lipeng Wang;Haohua Chen;Li Yin;Lu Sheng","doi":"10.1109/TIP.2025.3597037","DOIUrl":null,"url":null,"abstract":"Multi-part portrait customization aims to generate realistic human images by assembling specified body parts from multiple reference images, with significant applications in digital human creation. Existing customization methods typically follow two approaches: 1) test-time fine-tuning, which learn concepts effectively but is time-consuming and struggles with multi-part composition; 2) generalizable feed-forward methods, which offer efficiency but lack fine control over appearance specifics. To address these limitations, we present Parts2Whole, a diffusion-based generalizable portrait generator that harmoniously integrates multiple reference parts into high-fidelity human images by our proposed multi-reference mechanism. To adequately characterize each part, we propose a detail-aware appearance encoder, which is initialized and inherits powerful image priors from the pre-trained denoising U-Net, enabling the encoding of detailed information from reference images. The extracted features are incorporated into the denoising U-Net by a shared self-attention mechanism, enhanced by mask information for precise part selection. Additionally, we integrate pose map conditioning to control the target posture of generated portraits, facilitating more flexible customization. Extensive experiments demonstrate the superiority of our approach over existing methods and applicability to related tasks like pose transfer and pose-guided human image generation, showcasing its versatile conditioning. Our project is available at <uri>https://huanngzh.github.io/Parts2Whole/</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5241-5256"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parts2Whole: Generalizable Multi-Part Portrait Customization\",\"authors\":\"Hongxing Fan;Zehuan Huang;Lipeng Wang;Haohua Chen;Li Yin;Lu Sheng\",\"doi\":\"10.1109/TIP.2025.3597037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-part portrait customization aims to generate realistic human images by assembling specified body parts from multiple reference images, with significant applications in digital human creation. Existing customization methods typically follow two approaches: 1) test-time fine-tuning, which learn concepts effectively but is time-consuming and struggles with multi-part composition; 2) generalizable feed-forward methods, which offer efficiency but lack fine control over appearance specifics. To address these limitations, we present Parts2Whole, a diffusion-based generalizable portrait generator that harmoniously integrates multiple reference parts into high-fidelity human images by our proposed multi-reference mechanism. To adequately characterize each part, we propose a detail-aware appearance encoder, which is initialized and inherits powerful image priors from the pre-trained denoising U-Net, enabling the encoding of detailed information from reference images. The extracted features are incorporated into the denoising U-Net by a shared self-attention mechanism, enhanced by mask information for precise part selection. Additionally, we integrate pose map conditioning to control the target posture of generated portraits, facilitating more flexible customization. Extensive experiments demonstrate the superiority of our approach over existing methods and applicability to related tasks like pose transfer and pose-guided human image generation, showcasing its versatile conditioning. Our project is available at <uri>https://huanngzh.github.io/Parts2Whole/</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"5241-5256\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11125861/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11125861/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多部分肖像定制旨在通过从多个参考图像中组装指定的人体部位来生成逼真的人体图像,在数字人体创作中具有重要应用。现有的定制方法通常采用两种方法:(1)测试时间微调,该方法可以有效地学习概念,但耗时且难以处理多部分组成;(2)广义前馈方法,提供了效率,但缺乏对外观细节的精细控制。为了解决这些限制,我们提出了Parts2Whole,这是一个基于扩散的通用肖像生成器,通过我们提出的多参考机制将多个参考部分和谐地集成到高保真的人体图像中。为了充分表征每个部分,我们提出了一个细节感知外观编码器,该编码器被初始化并从预训练的去噪U-Net中继承强大的图像先验,从而能够从参考图像中编码详细信息。通过共享自关注机制将提取的特征合并到去噪的U-Net中,并通过掩码信息增强以进行精确的部件选择。此外,我们还集成了姿态映射条件来控制生成人像的目标姿态,方便更灵活的定制。大量的实验证明了我们的方法优于现有方法,并且适用于姿势转移和姿势引导的人体图像生成等相关任务,展示了它的多用途调节。我们的项目可以在https://huanngzh.github.io/Parts2Whole/上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parts2Whole: Generalizable Multi-Part Portrait Customization
Multi-part portrait customization aims to generate realistic human images by assembling specified body parts from multiple reference images, with significant applications in digital human creation. Existing customization methods typically follow two approaches: 1) test-time fine-tuning, which learn concepts effectively but is time-consuming and struggles with multi-part composition; 2) generalizable feed-forward methods, which offer efficiency but lack fine control over appearance specifics. To address these limitations, we present Parts2Whole, a diffusion-based generalizable portrait generator that harmoniously integrates multiple reference parts into high-fidelity human images by our proposed multi-reference mechanism. To adequately characterize each part, we propose a detail-aware appearance encoder, which is initialized and inherits powerful image priors from the pre-trained denoising U-Net, enabling the encoding of detailed information from reference images. The extracted features are incorporated into the denoising U-Net by a shared self-attention mechanism, enhanced by mask information for precise part selection. Additionally, we integrate pose map conditioning to control the target posture of generated portraits, facilitating more flexible customization. Extensive experiments demonstrate the superiority of our approach over existing methods and applicability to related tasks like pose transfer and pose-guided human image generation, showcasing its versatile conditioning. Our project is available at https://huanngzh.github.io/Parts2Whole/
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信