{"title":"Dual-Level Matching With Outlier Filtering for Unsupervised Visible-Infrared Person Re-Identification","authors":"Mang Ye;Zesen Wu;Bo Du","doi":"10.1109/TPAMI.2025.3541053","DOIUrl":null,"url":null,"abstract":"Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task due to the large modality gap. While numerous efforts have been devoted to the supervised setting with a large amount of labeled cross-modality correspondences, few studies have tried to mitigate the modality gap by mining cross-modality correspondences in an unsupervised manner. However, existing works failed to capture the intrinsic relations among samples across two modalities, resulting in limited performance outcomes. In this paper, we propose a novel Progressive Graph Matching (PGM) approach to globally model the cross-modality relationships and instance-level affinities. PGM formulates cross-modality correspondence mining as a graph matching procedure, aiming to integrate global information by minimizing global matching costs. Considering that samples in wrong clusters cannot find reliable cross-modality correspondences by PGM, we further introduce a robust Dual-Level Matching (DLM) mechanism, combining the cluster-level PGM and Nearest Instance-Cluster Searching (NICS) with instance-level affinity optimization. Additionally, we design an Outlier Filter Strategy (OFS) to filter out unreliable cross-modality correspondences based on the dual-level relation constraints. To mitigate false accumulation in cross-modal correspondence learning, an Alternate Cross Contrastive Learning (ACCL) module is proposed to alternately adjust the dominated matching, i.e., visible-to-infrared or infrared-to-visible matching. Empirical results demonstrate the superiority of our unsupervised solution, achieving comparable performance with supervised counterparts.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3815-3829"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10882953/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality retrieval task due to the large modality gap. While numerous efforts have been devoted to the supervised setting with a large amount of labeled cross-modality correspondences, few studies have tried to mitigate the modality gap by mining cross-modality correspondences in an unsupervised manner. However, existing works failed to capture the intrinsic relations among samples across two modalities, resulting in limited performance outcomes. In this paper, we propose a novel Progressive Graph Matching (PGM) approach to globally model the cross-modality relationships and instance-level affinities. PGM formulates cross-modality correspondence mining as a graph matching procedure, aiming to integrate global information by minimizing global matching costs. Considering that samples in wrong clusters cannot find reliable cross-modality correspondences by PGM, we further introduce a robust Dual-Level Matching (DLM) mechanism, combining the cluster-level PGM and Nearest Instance-Cluster Searching (NICS) with instance-level affinity optimization. Additionally, we design an Outlier Filter Strategy (OFS) to filter out unreliable cross-modality correspondences based on the dual-level relation constraints. To mitigate false accumulation in cross-modal correspondence learning, an Alternate Cross Contrastive Learning (ACCL) module is proposed to alternately adjust the dominated matching, i.e., visible-to-infrared or infrared-to-visible matching. Empirical results demonstrate the superiority of our unsupervised solution, achieving comparable performance with supervised counterparts.