Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao
{"title":"桥接模态差距:一种跨模态特征互补和特征投影网络用于可见-红外人再识别","authors":"Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao","doi":"10.1016/j.neucom.2025.130367","DOIUrl":null,"url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"642 ","pages":"Article 130367"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging modal gaps: A Cross-Modal Feature Complementation and Feature Projection Network for visible-infrared person re-identification\",\"authors\":\"Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao\",\"doi\":\"10.1016/j.neucom.2025.130367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"642 \",\"pages\":\"Article 130367\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010392\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010392","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Bridging modal gaps: A Cross-Modal Feature Complementation and Feature Projection Network for visible-infrared person re-identification
Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.