{"title":"TJCMNet: An Efficient Vision-Text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification","authors":"ZhuXuan Cheng;ZhiJia Zhang;Huijie Fan;XingQi Na","doi":"10.1109/LSP.2025.3556784","DOIUrl":null,"url":null,"abstract":"Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared cameras. Existing approaches rely on seeking distinctive information within unified visual feature space, ignoring the stable identity information brought by textual description. To overcome these problems, this letter propose a novel Text-vision Joint Clue Mining (TJCM) network to aggregate vision and text features, then distill the joint knowledge for enhancing the modality-shared branch. Specifically, we first extract modality-shared and textual features using a parameter-shared vision encoder and a text encoder. Then, a text-vision co-refinement module is proposed to refine the implicit information within vision feature and text feature, then aggregate them into joint feature. Finally, introduce the heterogeneous distillation alignment loss provides enhancement for modality-shared feature through joint knowledge distillation at feature-level and logit-level. Our TJCMNet achieves significant improvements over the state-of-the-art methods on three mainstream datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1615-1619"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10946852/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared cameras. Existing approaches rely on seeking distinctive information within unified visual feature space, ignoring the stable identity information brought by textual description. To overcome these problems, this letter propose a novel Text-vision Joint Clue Mining (TJCM) network to aggregate vision and text features, then distill the joint knowledge for enhancing the modality-shared branch. Specifically, we first extract modality-shared and textual features using a parameter-shared vision encoder and a text encoder. Then, a text-vision co-refinement module is proposed to refine the implicit information within vision feature and text feature, then aggregate them into joint feature. Finally, introduce the heterogeneous distillation alignment loss provides enhancement for modality-shared feature through joint knowledge distillation at feature-level and logit-level. Our TJCMNet achieves significant improvements over the state-of-the-art methods on three mainstream datasets.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.