{"title":"Visible-Infrared Person Re-Identification With Real-World Label Noise","authors":"Ruiheng Zhang;Zhe Cao;Yan Huang;Shuo Yang;Lixin Xu;Min Xu","doi":"10.1109/TCSVT.2025.3526449","DOIUrl":null,"url":null,"abstract":"In recent years, growing needs for advanced security and traffic management have significantly heightened the prominence of the visible-infrared person re-identification community (VI-ReID), garnering considerable attention. A critical challenge in VI-ReID is the performance degradation attributable to label noise, an issue that becomes even more pronounced in cross-modal scenarios due to an increased likelihood of data confusion. While previous methods have achieved notable successes, they often overlook the complexities of instance-dependent and real-world noise, creating a disconnect from the practical applications of person re-identification. To bridge this gap, our research analyzes the primary sources of label noise in real-world settings, which include a) instantiated identities, b) blurry infrared images, and c) annotators’ errors. In response to these challenges, we develop a Robust Hybrid Loss function (RHL) that enables targeted recognition and retrieval optimization through a more fine-grained division of the noisy dataset. The proposed method categorises data into three sets: clean, obviously noisy, and indistinguishably noisy, with bespoke loss calculations for each category. The identification loss is structured to address the varied nature of these sets specifically. For the retrieval sub-task, we utilize an enhanced triplet loss, adept at handling noisy correspondences. Furthermore, to empirically validate our method, we have re-annotated a real-world dataset, SYSU-Real. Our experiments on SYSU-MM01 and RegDB, conducted under various noise ratios of random and instance-dependent label noise, demonstrate the generalized robustness and effectiveness of our proposed approach.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4857-4869"},"PeriodicalIF":8.3000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10829635/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, growing needs for advanced security and traffic management have significantly heightened the prominence of the visible-infrared person re-identification community (VI-ReID), garnering considerable attention. A critical challenge in VI-ReID is the performance degradation attributable to label noise, an issue that becomes even more pronounced in cross-modal scenarios due to an increased likelihood of data confusion. While previous methods have achieved notable successes, they often overlook the complexities of instance-dependent and real-world noise, creating a disconnect from the practical applications of person re-identification. To bridge this gap, our research analyzes the primary sources of label noise in real-world settings, which include a) instantiated identities, b) blurry infrared images, and c) annotators’ errors. In response to these challenges, we develop a Robust Hybrid Loss function (RHL) that enables targeted recognition and retrieval optimization through a more fine-grained division of the noisy dataset. The proposed method categorises data into three sets: clean, obviously noisy, and indistinguishably noisy, with bespoke loss calculations for each category. The identification loss is structured to address the varied nature of these sets specifically. For the retrieval sub-task, we utilize an enhanced triplet loss, adept at handling noisy correspondences. Furthermore, to empirically validate our method, we have re-annotated a real-world dataset, SYSU-Real. Our experiments on SYSU-MM01 and RegDB, conducted under various noise ratios of random and instance-dependent label noise, demonstrate the generalized robustness and effectiveness of our proposed approach.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.