基于提示的多模态对象有效再识别的模态对齐方法

IF 13.7
Shizhou Zhang;Wenlong Luo;De Cheng;Yinghui Xing;Guoqiang Liang;Peng Wang;Yanning Zhang
{"title":"基于提示的多模态对象有效再识别的模态对齐方法","authors":"Shizhou Zhang;Wenlong Luo;De Cheng;Yinghui Xing;Guoqiang Liang;Peng Wang;Yanning Zhang","doi":"10.1109/TIP.2025.3556531","DOIUrl":null,"url":null,"abstract":"A critical challenge for multi-modal Object Re-Identification (ReID) is the effective aggregation of complementary information to mitigate illumination issues. State-of-the-art methods typically employ complex and highly-coupled architectures, which unavoidably result in heavy computational costs. Moreover, the significant distribution gap among different image spectra hinders the joint representation of multi-modal features. In this paper, we propose a framework named as PromptMA to establish effective communication channels between different modality paths, thereby aggregating modal complementary information and bridging the distribution gap. Specifically, we inject a series of learnable multi-modal prompts into the Image Encoder and introduce a prompt exchange mechanism to enable the prompts to alternately interact with different modal token embeddings, thus capturing and distributing multi-modal features effectively. Building on top of the multi-modal prompts, we further propose Prompt-based Token Selection (PBTS) and Prompt-based Modality Fusion (PBMF) modules to achieve effective multi-modal feature fusion while minimizing background interference. Additionally, due to the flexibility of our prompt exchange mechanism, our method is well-suited to handle scenarios with missing modalities. Extensive evaluations are conducted on four widely used benchmark datasets and the experimental results demonstrate that our method achieves state-of-the-art performances, surpassing the current benchmarks by over 15% on the challenging MSVR310 dataset and by 6% on the RGBNT201. The code is available at <uri>https://github.com/FHR-L/PromptMA</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2450-2462"},"PeriodicalIF":13.7000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prompt-Based Modality Alignment for Effective Multi-Modal Object Re-Identification\",\"authors\":\"Shizhou Zhang;Wenlong Luo;De Cheng;Yinghui Xing;Guoqiang Liang;Peng Wang;Yanning Zhang\",\"doi\":\"10.1109/TIP.2025.3556531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A critical challenge for multi-modal Object Re-Identification (ReID) is the effective aggregation of complementary information to mitigate illumination issues. State-of-the-art methods typically employ complex and highly-coupled architectures, which unavoidably result in heavy computational costs. Moreover, the significant distribution gap among different image spectra hinders the joint representation of multi-modal features. In this paper, we propose a framework named as PromptMA to establish effective communication channels between different modality paths, thereby aggregating modal complementary information and bridging the distribution gap. Specifically, we inject a series of learnable multi-modal prompts into the Image Encoder and introduce a prompt exchange mechanism to enable the prompts to alternately interact with different modal token embeddings, thus capturing and distributing multi-modal features effectively. Building on top of the multi-modal prompts, we further propose Prompt-based Token Selection (PBTS) and Prompt-based Modality Fusion (PBMF) modules to achieve effective multi-modal feature fusion while minimizing background interference. Additionally, due to the flexibility of our prompt exchange mechanism, our method is well-suited to handle scenarios with missing modalities. Extensive evaluations are conducted on four widely used benchmark datasets and the experimental results demonstrate that our method achieves state-of-the-art performances, surpassing the current benchmarks by over 15% on the challenging MSVR310 dataset and by 6% on the RGBNT201. The code is available at <uri>https://github.com/FHR-L/PromptMA</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"2450-2462\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10955143/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10955143/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多模态目标再识别(ReID)的一个关键挑战是有效地聚合互补信息以减轻照明问题。最先进的方法通常采用复杂和高度耦合的体系结构,这不可避免地导致沉重的计算成本。此外,不同图像光谱之间的显著分布差距阻碍了多模态特征的联合表示。在本文中,我们提出了一个名为PromptMA的框架,在不同的模态路径之间建立有效的通信渠道,从而聚合模态互补信息,弥合分布差距。具体来说,我们在Image Encoder中注入一系列可学习的多模态提示符,并引入提示符交换机制,使提示符能够与不同的模态令牌嵌入交替交互,从而有效地捕获和分发多模态特征。在多模态提示的基础上,我们进一步提出了基于提示的Token选择(PBTS)和基于提示的模态融合(PBMF)模块,以实现有效的多模态特征融合,同时最小化背景干扰。此外,由于我们的即时交换机制的灵活性,我们的方法非常适合处理缺少模式的情况。在四个广泛使用的基准数据集上进行了广泛的评估,实验结果表明,我们的方法达到了最先进的性能,在具有挑战性的MSVR310数据集上超过当前基准超过15%,在RGBNT201上超过6%。代码可在https://github.com/FHR-L/PromptMA上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prompt-Based Modality Alignment for Effective Multi-Modal Object Re-Identification
A critical challenge for multi-modal Object Re-Identification (ReID) is the effective aggregation of complementary information to mitigate illumination issues. State-of-the-art methods typically employ complex and highly-coupled architectures, which unavoidably result in heavy computational costs. Moreover, the significant distribution gap among different image spectra hinders the joint representation of multi-modal features. In this paper, we propose a framework named as PromptMA to establish effective communication channels between different modality paths, thereby aggregating modal complementary information and bridging the distribution gap. Specifically, we inject a series of learnable multi-modal prompts into the Image Encoder and introduce a prompt exchange mechanism to enable the prompts to alternately interact with different modal token embeddings, thus capturing and distributing multi-modal features effectively. Building on top of the multi-modal prompts, we further propose Prompt-based Token Selection (PBTS) and Prompt-based Modality Fusion (PBMF) modules to achieve effective multi-modal feature fusion while minimizing background interference. Additionally, due to the flexibility of our prompt exchange mechanism, our method is well-suited to handle scenarios with missing modalities. Extensive evaluations are conducted on four widely used benchmark datasets and the experimental results demonstrate that our method achieves state-of-the-art performances, surpassing the current benchmarks by over 15% on the challenging MSVR310 dataset and by 6% on the RGBNT201. The code is available at https://github.com/FHR-L/PromptMA
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信