基于clip的摄像机不可知特征学习在摄像机内监督下的人物再识别

IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xuan Tan;Xun Gong;Yang Xiang
{"title":"基于clip的摄像机不可知特征学习在摄像机内监督下的人物再识别","authors":"Xuan Tan;Xun Gong;Yang Xiang","doi":"10.1109/TCSVT.2024.3522178","DOIUrl":null,"url":null,"abstract":"Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model’s ability to predict the camera from which a pedestrian image originates, thus enhancing the model’s capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6%. Code is available at <uri>https://gitee.com/swjtugx/classmate/tree/master/OurGroup/CCAFL</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4100-4115"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CLIP-Based Camera-Agnostic Feature Learning for Intra-Camera Supervised Person Re-Identification\",\"authors\":\"Xuan Tan;Xun Gong;Yang Xiang\",\"doi\":\"10.1109/TCSVT.2024.3522178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model’s ability to predict the camera from which a pedestrian image originates, thus enhancing the model’s capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6%. Code is available at <uri>https://gitee.com/swjtugx/classmate/tree/master/OurGroup/CCAFL</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4100-4115\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10813454/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10813454/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

对比语言图像预训练(CLIP)模型在生成行人图像的文本描述方面具有固有的优势,在传统的人物再识别(ReID)任务中表现优异。然而,将CLIP直接应用于摄像机内监督人员再识别(ICS ReID)存在挑战。ICS ReID要求在每个摄像机内进行独立的身份标识,而不需要在摄像机之间进行关联。这限制了基于文本的增强的有效性。为了解决这个问题,我们提出了一个新的框架,称为基于clip的摄像机不可知论特征学习(CCAFL)。因此,设计了两个自定义模块来指导模型主动学习与相机无关的行人特征:相机内判别学习(ICDL)和相机间对抗学习(ICAL)。具体来说,我们首先为摄像头内的行人图像建立可学习的文本提示,为后续的摄像头内和摄像头间学习获得关键的语义监督信号。然后,我们设计了ICDL,通过考虑每个相机内的硬正样本和硬负样本来增加类间变化,从而学习相机内细粒度的行人特征。此外,我们提出ICAL通过惩罚模型预测行人图像来源的相机的能力来减少相机间的行人特征差异,从而增强模型从不同角度识别行人的能力。在流行的ReID数据集上进行的大量实验证明了我们方法的有效性。特别是,在具有挑战性的MSMT17数据集上,我们的mAP准确率达到58.9%,比目前最先进的方法高出7.6%。代码可从https://gitee.com/swjtugx/classmate/tree/master/OurGroup/CCAFL获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CLIP-Based Camera-Agnostic Feature Learning for Intra-Camera Supervised Person Re-Identification
Contrastive Language-Image Pre-Training (CLIP) model excels in traditional person re-identification (ReID) tasks due to its inherent advantage in generating textual descriptions for pedestrian images. However, applying CLIP directly to intra-camera supervised person re-identification (ICS ReID) presents challenges. ICS ReID requires independent identity labeling within each camera, without associations across cameras. This limits the effectiveness of text-based enhancements. To address this, we propose a novel framework called CLIP-based Camera-Agnostic Feature Learning (CCAFL) for ICS ReID. Accordingly, two custom modules are designed to guide the model to actively learn camera-agnostic pedestrian features: Intra-Camera Discriminative Learning (ICDL) and Inter-Camera Adversarial Learning (ICAL). Specifically, we first establish learnable textual prompts for intra-camera pedestrian images to obtain crucial semantic supervision signals for subsequent intra- and inter-camera learning. Then, we design ICDL to increase inter-class variation by considering the hard positive and hard negative samples within each camera, thereby learning intra-camera finer-grained pedestrian features. Additionally, we propose ICAL to reduce inter-camera pedestrian feature discrepancies by penalizing the model’s ability to predict the camera from which a pedestrian image originates, thus enhancing the model’s capability to recognize pedestrians from different viewpoints. Extensive experiments on popular ReID datasets demonstrate the effectiveness of our approach. Especially, on the challenging MSMT17 dataset, we arrive at 58.9% in terms of mAP accuracy, surpassing state-of-the-art methods by 7.6%. Code is available at https://gitee.com/swjtugx/classmate/tree/master/OurGroup/CCAFL.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信