{"title":"Semi-supervised cross-modality person re-identification based on pseudo label learning","authors":"Fei Wu , Ruixuan Zhou , Yang Gao , Yujian Feng , Qinghua Huang , Xiao-Yuan Jing","doi":"10.1016/j.imavis.2025.105602","DOIUrl":null,"url":null,"abstract":"<div><div>Visible-infrared person re-identification (RGB-IR Re-ID) aims to find images of the same identity from different modalities. In practice, multiple person and cameras can provide abundant training samples and non-negligible modality differences makes manual labeling of all samples be impractical. How to accurately re-identify cross-modality pedestrians under the training condition of having few labeled samples and a quantity of unlabeled samples is an important research question. However, person re-identification in this scenario, which we call Semi-Supervised Cross-Modality Re-ID (SSCM Re-ID), has not been well studied. In this paper, we propose a cross-modality pseudo label learning (CPL) framework for SSCM Re-ID task. It consists of three modules: the feature mapping module, the identity alignment module and the pseudo-label generation module. The feature mapping module is designed to extract shared discriminatory features from modality-specific channels, followed by the identity alignment module that aims to align person identities jointly at the global-level and part-level aspects. Finally, the pseudo-label generation module is used to select samples with reliable pseudo labels from the unlabeled samples based on the confidence level. Moreover, we propose the dynamic center-based cross-entropy loss to constrain the distance of similar samples. Experiments on widely used cross-modality Re-ID datasets demonstrate that CPL can achieve the state-of-the-art SSCM Re-ID performance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105602"},"PeriodicalIF":4.2000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001908","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Visible-infrared person re-identification (RGB-IR Re-ID) aims to find images of the same identity from different modalities. In practice, multiple person and cameras can provide abundant training samples and non-negligible modality differences makes manual labeling of all samples be impractical. How to accurately re-identify cross-modality pedestrians under the training condition of having few labeled samples and a quantity of unlabeled samples is an important research question. However, person re-identification in this scenario, which we call Semi-Supervised Cross-Modality Re-ID (SSCM Re-ID), has not been well studied. In this paper, we propose a cross-modality pseudo label learning (CPL) framework for SSCM Re-ID task. It consists of three modules: the feature mapping module, the identity alignment module and the pseudo-label generation module. The feature mapping module is designed to extract shared discriminatory features from modality-specific channels, followed by the identity alignment module that aims to align person identities jointly at the global-level and part-level aspects. Finally, the pseudo-label generation module is used to select samples with reliable pseudo labels from the unlabeled samples based on the confidence level. Moreover, we propose the dynamic center-based cross-entropy loss to constrain the distance of similar samples. Experiments on widely used cross-modality Re-ID datasets demonstrate that CPL can achieve the state-of-the-art SSCM Re-ID performance.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.