桥接模态差距：一种跨模态特征互补和特征投影网络用于可见-红外人再识别

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-13 DOI:10.1016/j.neucom.2025.130367

Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao

{"title":"桥接模态差距：一种跨模态特征互补和特征投影网络用于可见-红外人再识别","authors":"Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao","doi":"10.1016/j.neucom.2025.130367","DOIUrl":null,"url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"642 ","pages":"Article 130367"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging modal gaps: A Cross-Modal Feature Complementation and Feature Projection Network for visible-infrared person re-identification\",\"authors\":\"Lei Shi , Yumao Ma , Yongcai Tao , Haowen Liu , Lin Wei , Yucheng Shi , Yufei Gao\",\"doi\":\"10.1016/j.neucom.2025.130367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"642 \",\"pages\":\"Article 130367\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010392\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010392","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于红外（IR）和可见光（VIS）图像之间存在着巨大的模态差异，可见-红外人物再识别（VI-ReID）面临着巨大的挑战，这主要是由于它们不同的颜色分布和纹理特征。减少模态差距的一种有效策略是利用特征投影为模态特征创建共享嵌入空间。然而，如何在投影过程中有效地对齐跨模态特征，同时最大限度地减少信息损失是一个关键的研究问题。为了解决这一问题，本文提出了一种跨模态特征互补和特征投影网络（FCFPN）。具体而言，引入了一种模态互补策略来弥合跨模态特征之间的差异，并促进它们的对齐。此外，采用跨模态特征投影机制将模态相关特征嵌入到共享特征空间中，从而减少因模态差异造成的特征损失。进一步，从共享空间中提取多通道、多层次特征，增强特征的整体表征。大量的实验结果表明，所提出的FCFPN模型有效地缓解了模态差异，在SYSU-MM01数据集的室内测试模式下，该模型的Rank-1准确率为84.7%，mAP为86.9%；在RegDB数据集的vis - ir测试模式下，该模型的Rank-1准确率为93.0%，mAP为87.3%，优于几种最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bridging modal gaps: A Cross-Modal Feature Complementation and Feature Projection Network for visible-infrared person re-identification

Visible-infrared person re-identification (VI-ReID) presents a significant challenge due to the substantial modal differences between infrared (IR) and visible (VIS) images, primarily resulting from their distinct color distributions and textural characteristics. One effective strategy for reducing this modal gap is to utilize feature projection to create a shared embedded space for the modal features. However, a key research question remains: how to effectively align cross-modal features during projection while minimizing the loss of information. To address this challenge, this paper proposed a Cross-Modal Feature Complementation and Feature Projection Network (FCFPN). Specifically, a modal complementation strategy was introduced to bridge the discrepancies between cross-modal features and facilitate their alignment. Additionally, a cross-modal feature projection mechanism was employed to embed modality-correlated features into the shared feature space, thereby mitigating feature loss caused by modality differences. Furthermore, multi-channel and multi-level features were extracted from the shared space to enhance the overall feature representation. Extensive experimental results demonstrated that the proposed FCFPN model effectively mitigated the modal discrepancy, achieving 84.7% Rank-1 accuracy and 86.9% mAP in the indoor test mode of the SYSU-MM01 dataset, and 93.0% Rank-1 accuracy and 87.3% mAP in the VIS-to-IR test mode of the RegDB dataset, thereby outperforming several state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.