Multi-View User Preference Modeling for Personalized Text-to-Image Generation

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Huaiwen Zhang;Tianci Wu;Yinwei Wei
{"title":"Multi-View User Preference Modeling for Personalized Text-to-Image Generation","authors":"Huaiwen Zhang;Tianci Wu;Yinwei Wei","doi":"10.1109/TMM.2025.3557683","DOIUrl":null,"url":null,"abstract":"Personalized text-to-image generation aims to synthesize images tailored to individual user preferences. Existing methods primarily generate customized content using a few reference images, which often struggle to mine user preferences from historical records, and thus fail to synthesize truly personalized content. In addition, it is difficult to directly incorporate the extracted feature of user preferences into the feature space of the generation model, since there exists a considerable gap between them. In this paper, we propose a novel multi-view personalized text-to-image generation method based on the diffusion model, named MVP-Diffusion, which learns instance- and user-level preferences from historical records and integrates them into the generation model. For instance-level user preference modeling, we employ a chain-of-thought prompting strategy to deduce preference keywords and integrate them into input prompts with the aid of a large language model. For user-level preference modeling, we construct a learnable embedding for each user to capture more comprehensive preferences by analyzing their historical records. An adaptive user preference fusion module is proposed to inject user preferences into the generation model via a set of learnable parameters. Experimental results demonstrate that the proposed method significantly enhances the personalization of the generated images compared to the other personalized text-to-image generation methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3082-3091"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948278/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Personalized text-to-image generation aims to synthesize images tailored to individual user preferences. Existing methods primarily generate customized content using a few reference images, which often struggle to mine user preferences from historical records, and thus fail to synthesize truly personalized content. In addition, it is difficult to directly incorporate the extracted feature of user preferences into the feature space of the generation model, since there exists a considerable gap between them. In this paper, we propose a novel multi-view personalized text-to-image generation method based on the diffusion model, named MVP-Diffusion, which learns instance- and user-level preferences from historical records and integrates them into the generation model. For instance-level user preference modeling, we employ a chain-of-thought prompting strategy to deduce preference keywords and integrate them into input prompts with the aid of a large language model. For user-level preference modeling, we construct a learnable embedding for each user to capture more comprehensive preferences by analyzing their historical records. An adaptive user preference fusion module is proposed to inject user preferences into the generation model via a set of learnable parameters. Experimental results demonstrate that the proposed method significantly enhances the personalization of the generated images compared to the other personalized text-to-image generation methods.
个性化文本到图像生成的多视图用户偏好建模
个性化的文本到图像生成旨在合成适合个人用户偏好的图像。现有的方法主要是使用少量参考图像生成定制内容,这些方法往往难以从历史记录中挖掘用户偏好,因此无法合成真正个性化的内容。此外,提取的用户偏好特征很难直接纳入生成模型的特征空间,因为它们之间存在相当大的差距。本文提出了一种新的基于扩散模型的多视图个性化文本到图像生成方法,称为MVP-Diffusion,该方法从历史记录中学习实例级和用户级偏好,并将其集成到生成模型中。对于实例级用户偏好建模,我们采用思维链提示策略来推断偏好关键字,并借助大型语言模型将其集成到输入提示中。对于用户级偏好建模,我们为每个用户构建一个可学习的嵌入,通过分析他们的历史记录来捕获更全面的偏好。提出了一种自适应用户偏好融合模块,通过一组可学习参数将用户偏好注入生成模型。实验结果表明,与其他个性化文本到图像生成方法相比,该方法显著增强了生成图像的个性化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信