CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang
{"title":"CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification","authors":"Zichang Tan ,&nbsp;Guiwei Zhang ,&nbsp;Zihui Tan ,&nbsp;Prayag Tiwari ,&nbsp;Yi Wang ,&nbsp;Yang Yang","doi":"10.1016/j.inffus.2025.103011","DOIUrl":null,"url":null,"abstract":"<div><div>Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the <strong>CAM</strong>emra-specific <strong>C</strong>lass <strong>A</strong>ctivation <strong>M</strong>ap (<span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>-guided Vision Transformer, which is termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former, with three core designs. <strong>First</strong>, we develop Fusion of CAMmera-specific Class Activation Map, termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion, which consists of positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> that operate in synergy to capture visual patterns representative of the discriminative foreground components. <strong>Second</strong>, to enhance the representation ability of pivotal foreground components, we introduce a <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>. <strong>Third</strong>, since the enhancement of foreground representations in <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former. This facilitates the extraction of discriminative foreground representations, circumventing <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103011"},"PeriodicalIF":14.7000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000843","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the CAMemra-specific Class Activation Map (CAM2), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the CAM2-guided Vision Transformer, which is termed CAM2Former, with three core designs. First, we develop Fusion of CAMmera-specific Class Activation Map, termed CAM2Fusion, which consists of positive and negative CAM2 that operate in synergy to capture visual patterns representative of the discriminative foreground components. Second, to enhance the representation ability of pivotal foreground components, we introduce a CAM2Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative CAM2. Third, since the enhancement of foreground representations in CAM2Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via CAM2Former. This facilitates the extraction of discriminative foreground representations, circumventing CAM2 dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信