Mingxin Yu , Yiyuan Ge , Zhihao Chen , Rui You , Lianqing Zhu , Mingwei Lin , Zeshui Xu
{"title":"无所遁形:对跨模态人再认同的暗示性线索引导","authors":"Mingxin Yu , Yiyuan Ge , Zhihao Chen , Rui You , Lianqing Zhu , Mingwei Lin , Zeshui Xu","doi":"10.1016/j.inffus.2025.103185","DOIUrl":null,"url":null,"abstract":"<div><div>Criminal activities are frequently committed at night to avoid attention, which seriously challenges traditional re-identification (ReID) systems. Recently, visible–infrared person re-identification (VI-ReID) has been in the spotlight for wide applications in low-light scenes, aiming to match pedestrians across the inherent modality gap between infrared images (for night) and visible images (for daytime). Previous deep learning-based methods mainly bridge the modality gap either by cross-modality translation or learning modality-shared representation. However, the former inevitably damages the original modality information, while the latter ignores fine-grained intrinsic metric relationships between cross-spectral features. In this paper, we propose a suggestive-clues reconfiguration (SCR) framework, which includes representation learning and feature reconfiguration sub-networks. The representation learning is pursued in modality-shared domain, in which we suggest a local cross-alignment (LCA) loss to further optimize the metric between cross-modality clustering components and centers, exploring fine-grained modality-consistent representations. In the feature reconfiguration network, we decouple infrared and visible modality features and introduce reconfiguration encoder to learn identity-related suggestive clues, enhancing the controllability of cross-modality learning. Extensive experiments on SYSU-MM01 and RegDB datasets demonstrate that our SCR is a new state-of-the-art method. Specifically, the Rank-1 and Rank-10 accuracy of SCR are 97.9% and about 100% on the RegDB dataset. Our interesting research highlights the role of suggestive clues in VI-ReID, and our code can be obtained at: <span><span>https://github.com/ISCLab-Bistu/VI-ReID</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103185"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"No escape: Towards suggestive clues guidance for cross-modality person re-identification\",\"authors\":\"Mingxin Yu , Yiyuan Ge , Zhihao Chen , Rui You , Lianqing Zhu , Mingwei Lin , Zeshui Xu\",\"doi\":\"10.1016/j.inffus.2025.103185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Criminal activities are frequently committed at night to avoid attention, which seriously challenges traditional re-identification (ReID) systems. Recently, visible–infrared person re-identification (VI-ReID) has been in the spotlight for wide applications in low-light scenes, aiming to match pedestrians across the inherent modality gap between infrared images (for night) and visible images (for daytime). Previous deep learning-based methods mainly bridge the modality gap either by cross-modality translation or learning modality-shared representation. However, the former inevitably damages the original modality information, while the latter ignores fine-grained intrinsic metric relationships between cross-spectral features. In this paper, we propose a suggestive-clues reconfiguration (SCR) framework, which includes representation learning and feature reconfiguration sub-networks. The representation learning is pursued in modality-shared domain, in which we suggest a local cross-alignment (LCA) loss to further optimize the metric between cross-modality clustering components and centers, exploring fine-grained modality-consistent representations. In the feature reconfiguration network, we decouple infrared and visible modality features and introduce reconfiguration encoder to learn identity-related suggestive clues, enhancing the controllability of cross-modality learning. Extensive experiments on SYSU-MM01 and RegDB datasets demonstrate that our SCR is a new state-of-the-art method. Specifically, the Rank-1 and Rank-10 accuracy of SCR are 97.9% and about 100% on the RegDB dataset. Our interesting research highlights the role of suggestive clues in VI-ReID, and our code can be obtained at: <span><span>https://github.com/ISCLab-Bistu/VI-ReID</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"122 \",\"pages\":\"Article 103185\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525002581\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002581","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
No escape: Towards suggestive clues guidance for cross-modality person re-identification
Criminal activities are frequently committed at night to avoid attention, which seriously challenges traditional re-identification (ReID) systems. Recently, visible–infrared person re-identification (VI-ReID) has been in the spotlight for wide applications in low-light scenes, aiming to match pedestrians across the inherent modality gap between infrared images (for night) and visible images (for daytime). Previous deep learning-based methods mainly bridge the modality gap either by cross-modality translation or learning modality-shared representation. However, the former inevitably damages the original modality information, while the latter ignores fine-grained intrinsic metric relationships between cross-spectral features. In this paper, we propose a suggestive-clues reconfiguration (SCR) framework, which includes representation learning and feature reconfiguration sub-networks. The representation learning is pursued in modality-shared domain, in which we suggest a local cross-alignment (LCA) loss to further optimize the metric between cross-modality clustering components and centers, exploring fine-grained modality-consistent representations. In the feature reconfiguration network, we decouple infrared and visible modality features and introduce reconfiguration encoder to learn identity-related suggestive clues, enhancing the controllability of cross-modality learning. Extensive experiments on SYSU-MM01 and RegDB datasets demonstrate that our SCR is a new state-of-the-art method. Specifically, the Rank-1 and Rank-10 accuracy of SCR are 97.9% and about 100% on the RegDB dataset. Our interesting research highlights the role of suggestive clues in VI-ReID, and our code can be obtained at: https://github.com/ISCLab-Bistu/VI-ReID.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.