TJCMNet: An Efficient Vision-Text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
ZhuXuan Cheng;ZhiJia Zhang;Huijie Fan;XingQi Na
{"title":"TJCMNet: An Efficient Vision-Text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification","authors":"ZhuXuan Cheng;ZhiJia Zhang;Huijie Fan;XingQi Na","doi":"10.1109/LSP.2025.3556784","DOIUrl":null,"url":null,"abstract":"Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared cameras. Existing approaches rely on seeking distinctive information within unified visual feature space, ignoring the stable identity information brought by textual description. To overcome these problems, this letter propose a novel Text-vision Joint Clue Mining (TJCM) network to aggregate vision and text features, then distill the joint knowledge for enhancing the modality-shared branch. Specifically, we first extract modality-shared and textual features using a parameter-shared vision encoder and a text encoder. Then, a text-vision co-refinement module is proposed to refine the implicit information within vision feature and text feature, then aggregate them into joint feature. Finally, introduce the heterogeneous distillation alignment loss provides enhancement for modality-shared feature through joint knowledge distillation at feature-level and logit-level. Our TJCMNet achieves significant improvements over the state-of-the-art methods on three mainstream datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1615-1619"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10946852/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared cameras. Existing approaches rely on seeking distinctive information within unified visual feature space, ignoring the stable identity information brought by textual description. To overcome these problems, this letter propose a novel Text-vision Joint Clue Mining (TJCM) network to aggregate vision and text features, then distill the joint knowledge for enhancing the modality-shared branch. Specifically, we first extract modality-shared and textual features using a parameter-shared vision encoder and a text encoder. Then, a text-vision co-refinement module is proposed to refine the implicit information within vision feature and text feature, then aggregate them into joint feature. Finally, introduce the heterogeneous distillation alignment loss provides enhancement for modality-shared feature through joint knowledge distillation at feature-level and logit-level. Our TJCMNet achieves significant improvements over the state-of-the-art methods on three mainstream datasets.
TJCMNet:一种高效的视觉-文本联合身份线索挖掘网络,用于可见-红外人物再识别
由于RGB相机和红外相机成像原理的不同,造成了巨大的模态差异,可见-红外人体再识别的图像检索具有挑战性。现有的方法依赖于在统一的视觉特征空间中寻找有特色的信息,忽略了文本描述带来的稳定的身份信息。为了克服这些问题,本文提出了一种新的文本-视觉联合线索挖掘(TJCM)网络,将视觉和文本特征聚合在一起,然后提取联合知识来增强模态共享分支。具体来说,我们首先使用参数共享视觉编码器和文本编码器提取模态共享和文本特征。然后,提出了文本-视觉共精模块,对视觉特征和文本特征中的隐含信息进行细化,并将其聚合为联合特征;最后,引入了异构精馏,通过特征级和逻辑级的联合知识精馏,对模态共享特征进行了增强。我们的TJCMNet在三个主流数据集上实现了比最先进的方法的重大改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信