Camera-aware graph multi-domain adaptive learning for unsupervised person re-identification

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-11-28 DOI:10.1016/j.patcog.2024.111217

Zhidan Ran, Xiaobo Lu, Xuan Wei, Wei Liu

{"title":"Camera-aware graph multi-domain adaptive learning for unsupervised person re-identification","authors":"Zhidan Ran, Xiaobo Lu, Xuan Wei, Wei Liu","doi":"10.1016/j.patcog.2024.111217","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, unsupervised person re-identification (Re-ID) has gained much attention due to its important practical significance in real-world application scenarios without pairwise labeled data. A key challenge for unsupervised person Re-ID is learning discriminative and robust feature representations under cross-camera scene variation. Contrastive learning approaches treat unsupervised representation learning as a dictionary look-up task. However, existing methods ignore both intra- and inter-camera semantic associations during training. In this paper, we propose a novel unsupervised person Re-ID framework, Camera-Aware Graph Multi-Domain Adaptive Learning (CGMAL), which can conduct multi-domain feature transfer with semantic propagation for learning discriminative domain-invariant representations. Specifically, we treat each camera as a distinct domain and extract image samples from every camera domain to form a mini-batch. A heterogeneous graph is constructed for representing the relationships between all instances in a mini-batch. Then a Graph Convolutional Network (GCN) is employed to fuse the image samples into a unified space and implement promising semantic transfer for providing ideal feature representations. Subsequently, we construct the memory-based non-parametric contrastive loss to train the model. In particular, we design an adversarial training scheme for transferring the knowledge learned by GCN to the feature extractor. Experimental experiments on three benchmarks validate that our proposed approach is superior to the state-of-the-art unsupervised methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111217"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009683","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, unsupervised person re-identification (Re-ID) has gained much attention due to its important practical significance in real-world application scenarios without pairwise labeled data. A key challenge for unsupervised person Re-ID is learning discriminative and robust feature representations under cross-camera scene variation. Contrastive learning approaches treat unsupervised representation learning as a dictionary look-up task. However, existing methods ignore both intra- and inter-camera semantic associations during training. In this paper, we propose a novel unsupervised person Re-ID framework, Camera-Aware Graph Multi-Domain Adaptive Learning (CGMAL), which can conduct multi-domain feature transfer with semantic propagation for learning discriminative domain-invariant representations. Specifically, we treat each camera as a distinct domain and extract image samples from every camera domain to form a mini-batch. A heterogeneous graph is constructed for representing the relationships between all instances in a mini-batch. Then a Graph Convolutional Network (GCN) is employed to fuse the image samples into a unified space and implement promising semantic transfer for providing ideal feature representations. Subsequently, we construct the memory-based non-parametric contrastive loss to train the model. In particular, we design an adversarial training scheme for transferring the knowledge learned by GCN to the feature extractor. Experimental experiments on three benchmarks validate that our proposed approach is superior to the state-of-the-art unsupervised methods.

查看原文本刊更多论文

无监督人再识别的摄像机感知图多域自适应学习

近年来，无监督人员再识别（Re-ID, unsupervised person - Re-ID）因其在没有数据成对标记的实际应用场景中具有重要的实际意义而受到广泛关注。无监督人再识别的一个关键挑战是学习跨摄像头场景变化下的判别和鲁棒特征表示。对比学习方法将无监督表示学习视为字典查找任务。然而，现有的方法在训练过程中忽略了相机内部和相机之间的语义关联。本文提出了一种新的无监督人身份识别框架——相机感知图多域自适应学习（CGMAL），该框架可以通过语义传播进行多域特征转移，学习判别性域不变表示。具体来说，我们将每个相机视为一个独立的域，并从每个相机域中提取图像样本以形成一个小批量。构建一个异构图来表示小批处理中所有实例之间的关系。然后利用图形卷积网络（GCN）将图像样本融合到一个统一的空间中，并实现有希望的语义转移，以提供理想的特征表示。随后，我们构造了基于记忆的非参数对比损失来训练模型。特别地，我们设计了一种对抗训练方案，用于将GCN学习到的知识转移到特征提取器中。在三个基准上的实验实验验证了我们提出的方法优于最先进的无监督方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.