Multi-granularity feature intersection learning for visible-infrared person re-identification

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2025-05-14 DOI:10.1007/s40747-025-01853-5

Sixian Chan, Jie Wang, Jiaao Cui, Jie Hu, Zhuorong Li, Jiafa Mao

{"title":"Multi-granularity feature intersection learning for visible-infrared person re-identification","authors":"Sixian Chan, Jie Wang, Jiaao Cui, Jie Hu, Zhuorong Li, Jiafa Mao","doi":"10.1007/s40747-025-01853-5","DOIUrl":null,"url":null,"abstract":"<p>This paper proposes a multi-granularity feature intersection network (MGFINet) for visible-infrared person re-identification (VI-ReID). VI-ReID aims to retrieve images of the same pedestrian from different spectral cameras. The key challenge is to extract pedestrian descriptions with both inter-class discriminability and intra-class similarity. Previous methods ignore the potential loss of details during representation extraction and the presence of data bias in the metric function, limiting further improvements in retrieval performance. Meanwhile, the discrepancy regarding how to calculate the loss for representation learning and metric learning also affects the model’s training. To address the above issues, MGFINet consists of three components: a hierarchical part pooling method (HPP), a hierarchical part restriction method (HPC), and a feature intersection (FI) loss. HPP adopts a hierarchical framework to extract multi-granularity pedestrian representations, and it performs an inter-layer fusion operation to exploit the high-resolution information from shallow layers and the semantic representability from deep layers. Meanwhile, HPP employs part pooling with different step sizes to capture pedestrian details in each layer. Next, HPC spreads the identity loss across all layers to reduce the distance for gradient backpropagation and further optimize fine-grained features in shallow layers. Besides, FI loss combines representation and metric learning by incorporating hyperparameters of classifiers into metric learning, mitigating data bias and reducing the gap between the two learning processes. Finally, extensive experiments evaluated on two public datasets, SYSU-MM01 and RegDB demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"42 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01853-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a multi-granularity feature intersection network (MGFINet) for visible-infrared person re-identification (VI-ReID). VI-ReID aims to retrieve images of the same pedestrian from different spectral cameras. The key challenge is to extract pedestrian descriptions with both inter-class discriminability and intra-class similarity. Previous methods ignore the potential loss of details during representation extraction and the presence of data bias in the metric function, limiting further improvements in retrieval performance. Meanwhile, the discrepancy regarding how to calculate the loss for representation learning and metric learning also affects the model’s training. To address the above issues, MGFINet consists of three components: a hierarchical part pooling method (HPP), a hierarchical part restriction method (HPC), and a feature intersection (FI) loss. HPP adopts a hierarchical framework to extract multi-granularity pedestrian representations, and it performs an inter-layer fusion operation to exploit the high-resolution information from shallow layers and the semantic representability from deep layers. Meanwhile, HPP employs part pooling with different step sizes to capture pedestrian details in each layer. Next, HPC spreads the identity loss across all layers to reduce the distance for gradient backpropagation and further optimize fine-grained features in shallow layers. Besides, FI loss combines representation and metric learning by incorporating hyperparameters of classifiers into metric learning, mitigating data bias and reducing the gap between the two learning processes. Finally, extensive experiments evaluated on two public datasets, SYSU-MM01 and RegDB demonstrate the effectiveness of the proposed method.

查看原文本刊更多论文

基于多粒度特征交集学习的可见红外人再识别

提出了一种多粒度特征交集网络（MGFINet）用于可见-红外人再识别（VI-ReID）。VI-ReID旨在从不同的光谱相机中检索同一行人的图像。关键的挑战是提取具有类间可辨性和类内相似性的行人描述。以前的方法忽略了在表示提取过程中可能丢失的细节和度量函数中存在的数据偏差，限制了检索性能的进一步提高。同时，表征学习和度量学习在损失计算上的差异也影响了模型的训练。为了解决上述问题，MGFINet由三个部分组成：HPP （hierarchical part pooling method）、HPC （hierarchical part restriction method）和FI （feature intersection loss）。HPP采用分层框架提取多粒度行人表示，并进行层间融合操作，利用浅层的高分辨率信息和深层的语义可表示性。同时，HPP采用不同步长的部分池化来捕获每一层的行人细节。接下来，HPC将身份损失分散到所有层，以减少梯度反向传播的距离，并进一步优化浅层中的细粒度特征。此外，FI loss通过将分类器的超参数纳入度量学习，将表征和度量学习结合起来，减轻了数据偏差，减少了两个学习过程之间的差距。最后，在SYSU-MM01和RegDB两个公共数据集上进行了大量实验，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.