Prompt-affinity multi-modal class centroids for unsupervised domain adaption

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xingwei Deng , Yangtao Wang , Yanzhao Xie , Xiaocui Li , Maobin Tang , Meie Fang , Wensheng Zhang
{"title":"Prompt-affinity multi-modal class centroids for unsupervised domain adaption","authors":"Xingwei Deng ,&nbsp;Yangtao Wang ,&nbsp;Yanzhao Xie ,&nbsp;Xiaocui Li ,&nbsp;Maobin Tang ,&nbsp;Meie Fang ,&nbsp;Wensheng Zhang","doi":"10.1016/j.patcog.2025.112095","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the advancements in large vision-language models (VLMs) like CLIP have sparked a renewed interest in leveraging the prompt learning mechanism to preserve semantic consistency between source and target domains in unsupervised domain adaption (UDA). While these approaches show promising results, they encounter fundamental limitations when quantifying the similarity between source and target domain data, primarily stemming from the redundant and modality-missing class centroids. To address these limitations, we propose <u><strong>P</strong></u>rompt-affinity <u><strong>M</strong></u>ulti-modal <u><strong>C</strong></u>lass <u><strong>C</strong></u>entroids for UDA (termed as PMCC). Firstly, we fuse the text class centroids (directly generated from the text encoder of CLIP with manual prompts for each class) and image class centroids (generated from the image encoder of CLIP for each class based on source domain images) to yield the multi-modal class centroids. Secondly, we conduct the cross-attention operation between each source or target domain image and these multi-modal class centroids. In this way, these class centroids that contain rich semantic information of each class will serve as a bridge to effectively measure the semantic similarity between different domains. Finally, we design a logit bias head and employ a multi-modal prompt learning mechanism to accurately predict the true class of each image for both source and target domains. We conduct extensive experiments on 4 popular UDA datasets including Office-31, Office-Home, VisDA-2017, and DomainNet. The experimental results validate our PMCC achieves higher performance with lower model complexity than the state-of-the-art (SOTA) UDA methods. The code of this project is available at GitHub: <span><span>https://github.com/246dxw/PMCC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112095"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325007551","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, the advancements in large vision-language models (VLMs) like CLIP have sparked a renewed interest in leveraging the prompt learning mechanism to preserve semantic consistency between source and target domains in unsupervised domain adaption (UDA). While these approaches show promising results, they encounter fundamental limitations when quantifying the similarity between source and target domain data, primarily stemming from the redundant and modality-missing class centroids. To address these limitations, we propose Prompt-affinity Multi-modal Class Centroids for UDA (termed as PMCC). Firstly, we fuse the text class centroids (directly generated from the text encoder of CLIP with manual prompts for each class) and image class centroids (generated from the image encoder of CLIP for each class based on source domain images) to yield the multi-modal class centroids. Secondly, we conduct the cross-attention operation between each source or target domain image and these multi-modal class centroids. In this way, these class centroids that contain rich semantic information of each class will serve as a bridge to effectively measure the semantic similarity between different domains. Finally, we design a logit bias head and employ a multi-modal prompt learning mechanism to accurately predict the true class of each image for both source and target domains. We conduct extensive experiments on 4 popular UDA datasets including Office-31, Office-Home, VisDA-2017, and DomainNet. The experimental results validate our PMCC achieves higher performance with lower model complexity than the state-of-the-art (SOTA) UDA methods. The code of this project is available at GitHub: https://github.com/246dxw/PMCC.
无监督域自适应的即时关联多模态类质心
近年来,像CLIP这样的大型视觉语言模型(VLMs)的进步引发了人们对利用即时学习机制来保持无监督域自适应(UDA)中源域和目标域之间语义一致性的新兴趣。虽然这些方法显示出有希望的结果,但它们在量化源和目标领域数据之间的相似性时遇到了基本的限制,主要源于冗余和模态缺失的类质心。为了解决这些限制,我们提出了用于UDA的即时关联多模态类质心(称为PMCC)。首先,我们将文本类质心(由CLIP的文本编码器直接生成,每个类都有手动提示)和图像类质心(由CLIP的图像编码器根据源域图像为每个类生成)融合得到多模态类质心。其次,在每个源或目标域图像与这些多模态类质心之间进行交叉关注运算。这样,这些包含每个类丰富语义信息的类质心将成为有效度量不同领域之间语义相似度的桥梁。最后,我们设计了一个logit偏置头,并采用多模态提示学习机制来准确预测源域和目标域的每个图像的真实类别。我们在Office-31、Office-Home、VisDA-2017和DomainNet等4个流行的UDA数据集上进行了广泛的实验。实验结果验证了我们的PMCC比最先进的(SOTA) UDA方法具有更高的性能和更低的模型复杂度。这个项目的代码可以在GitHub上找到:https://github.com/246dxw/PMCC。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信