DP-Adapter: Dual-pathway adapter for boosting fidelity and text consistency in customizable human image generation

IF 2.2 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models Pub Date : 2025-08-15 DOI:10.1016/j.gmod.2025.101292

Ye Wang , Ruiqi Liu , Xuping Xie , Lanjun Wang , Zili Yi , Rui Ma

{"title":"DP-Adapter: Dual-pathway adapter for boosting fidelity and text consistency in customizable human image generation","authors":"Ye Wang , Ruiqi Liu , Xuping Xie , Lanjun Wang , Zili Yi , Rui Ma","doi":"10.1016/j.gmod.2025.101292","DOIUrl":null,"url":null,"abstract":"<div><div>With the growing popularity of personalized human content creation and sharing, there is a rising demand for advanced techniques in customized human image generation. However, current methods struggle to simultaneously maintain the fidelity of human identity and ensure the consistency of textual prompts, often resulting in suboptimal outcomes. This shortcoming is primarily due to the lack of effective constraints during the simultaneous integration of visual and textual prompts, leading to unhealthy mutual interference that compromises the full expression of both types of input. Building on prior research that suggests visual and textual conditions influence different regions of an image in distinct ways, we introduce a novel Dual-Pathway Adapter (DP-Adapter) to enhance both high-fidelity identity preservation and textual consistency in personalized human image generation. Our approach begins by decoupling the target human image into visually sensitive and text-sensitive regions. For visually sensitive regions, DP-Adapter employs an Identity-Enhancing Adapter (IEA) to preserve detailed identity features. For text-sensitive regions, we introduce a Textual-Consistency Adapter (TCA) to minimize visual interference and ensure the consistency of textual semantics. To seamlessly integrate these pathways, we develop a Fine-Grained Feature-Level Blending (FFB) module that efficiently combines hierarchical semantic features from both pathways, resulting in more natural and coherent synthesis outcomes. Additionally, DP-Adapter supports various innovative applications, including controllable headshot-to-full-body portrait generation, age editing, old-photo to reality, and expression editing. Extensive experiments demonstrate that DP-Adapter outperforms state-of-the-art methods in both visual fidelity and text consistency, highlighting its effectiveness and versatility in the field of human image generation.</div></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"141 ","pages":"Article 101292"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070325000396","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

With the growing popularity of personalized human content creation and sharing, there is a rising demand for advanced techniques in customized human image generation. However, current methods struggle to simultaneously maintain the fidelity of human identity and ensure the consistency of textual prompts, often resulting in suboptimal outcomes. This shortcoming is primarily due to the lack of effective constraints during the simultaneous integration of visual and textual prompts, leading to unhealthy mutual interference that compromises the full expression of both types of input. Building on prior research that suggests visual and textual conditions influence different regions of an image in distinct ways, we introduce a novel Dual-Pathway Adapter (DP-Adapter) to enhance both high-fidelity identity preservation and textual consistency in personalized human image generation. Our approach begins by decoupling the target human image into visually sensitive and text-sensitive regions. For visually sensitive regions, DP-Adapter employs an Identity-Enhancing Adapter (IEA) to preserve detailed identity features. For text-sensitive regions, we introduce a Textual-Consistency Adapter (TCA) to minimize visual interference and ensure the consistency of textual semantics. To seamlessly integrate these pathways, we develop a Fine-Grained Feature-Level Blending (FFB) module that efficiently combines hierarchical semantic features from both pathways, resulting in more natural and coherent synthesis outcomes. Additionally, DP-Adapter supports various innovative applications, including controllable headshot-to-full-body portrait generation, age editing, old-photo to reality, and expression editing. Extensive experiments demonstrate that DP-Adapter outperforms state-of-the-art methods in both visual fidelity and text consistency, highlighting its effectiveness and versatility in the field of human image generation.

Abstract Image

查看原文本刊更多论文

DP-Adapter：用于在可定制的人类图像生成中提高保真度和文本一致性的双路径适配器

随着个性化人物内容创作和分享的日益普及，对定制人物图像生成的先进技术的需求也在不断增长。然而，目前的方法很难同时保持人类身份的保真度和确保文本提示的一致性，往往导致次优结果。这一缺陷主要是由于在同时集成视觉和文本提示时缺乏有效的约束，导致不健康的相互干扰，从而损害了两种输入类型的充分表达。基于先前的研究表明，视觉和文本条件以不同的方式影响图像的不同区域，我们引入了一种新的双路径适配器（DP-Adapter），以增强个性化人体图像生成中的高保真身份保存和文本一致性。我们的方法首先将目标人类图像解耦为视觉敏感和文本敏感区域。对于视觉敏感区域，DP-Adapter采用身份增强适配器（identity - enhancement Adapter， IEA）来保留详细的身份特征。对于文本敏感区域，我们引入了文本一致性适配器（TCA），以减少视觉干扰并确保文本语义的一致性。为了无缝集成这些路径，我们开发了一个细粒度特征级混合（FFB）模块，该模块有效地结合了来自两个路径的分层语义特征，从而产生更自然和连贯的合成结果。此外，DP-Adapter支持各种创新的应用程序，包括可控的头像到全身肖像生成，年龄编辑，老照片到现实，表情编辑。大量的实验表明，DP-Adapter在视觉保真度和文本一致性方面都优于最先进的方法，突出了其在人类图像生成领域的有效性和多功能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Graphical Models 工程技术-计算机：软件工程

CiteScore

3.60

自引率

5.90%

发文量

审稿时长

47 days

期刊介绍： Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics. We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way). GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.