Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang
{"title":"CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification","authors":"Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang","doi":"10.1016/j.inffus.2025.103011","DOIUrl":null,"url":null,"abstract":"<div><div>Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the <strong>CAM</strong>emra-specific <strong>C</strong>lass <strong>A</strong>ctivation <strong>M</strong>ap (<span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>-guided Vision Transformer, which is termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former, with three core designs. <strong>First</strong>, we develop Fusion of CAMmera-specific Class Activation Map, termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion, which consists of positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> that operate in synergy to capture visual patterns representative of the discriminative foreground components. <strong>Second</strong>, to enhance the representation ability of pivotal foreground components, we introduce a <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>. <strong>Third</strong>, since the enhancement of foreground representations in <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former. This facilitates the extraction of discriminative foreground representations, circumventing <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103011"},"PeriodicalIF":14.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000843","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the CAMemra-specific Class Activation Map (), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the -guided Vision Transformer, which is termed Former, with three core designs. First, we develop Fusion of CAMmera-specific Class Activation Map, termed Fusion, which consists of positive and negative that operate in synergy to capture visual patterns representative of the discriminative foreground components. Second, to enhance the representation ability of pivotal foreground components, we introduce a Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative . Third, since the enhancement of foreground representations in Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via Former. This facilitates the extraction of discriminative foreground representations, circumventing dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.