Multi-Context Grouped Attention for Unsupervised Person Re-Identification

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2022-12-09 DOI:10.1109/TBIOM.2022.3226678

Kshitij Nikhal;Benjamin S. Riggan

{"title":"Multi-Context Grouped Attention for Unsupervised Person Re-Identification","authors":"Kshitij Nikhal;Benjamin S. Riggan","doi":"10.1109/TBIOM.2022.3226678","DOIUrl":null,"url":null,"abstract":"Recent advancements like multiple contextual analysis, attention mechanisms, distance-aware optimization, and multi-task guidance have been widely used for supervised person re-identification (ReID), but the implementation and effects of such methods in unsupervised ReID frameworks are non-trivial and unclear, respectively. Moreover, with increasing size and complexity of image- and video-based ReID datasets, manual or semi-automated annotation procedures for supervised ReID are becoming labor intensive and cost prohibitive, which is undesirable especially considering the likelihood of annotation errors increase with scale/complexity of data collections. Therefore, we propose a new iterative clustering framework that is insensitive to annotation errors and over-fitting ReID annotations (i.e., labels). Our proposed unsupervised framework incorporates (a) a novel multi-context group attention architecture that learns a holistic attention map from multiple local and global contexts, (b) an unsupervised clustering loss function that down-weights easily discriminative identities, and (c) a background diversity term that helps cluster persons across different cross-camera views without leveraging any identification or camera labels. We perform extensive analysis using the DukeMTMC-VideoReID and MARS video-based ReID datasets and the MSMT17 image-based ReID dataset. Our approach is shown to provide a new state-of-the-art performance for unsupervised ReID, reducing the rank-1 performance gap between supervised and unsupervised ReID to 1.1%, 12.1%, and 21.9% from 6.1%, 17.9%, and 22.6% for DukeMTMC, MARS, and MSMT17 datasets, respectively.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"5 2","pages":"170-182"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9978648/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Recent advancements like multiple contextual analysis, attention mechanisms, distance-aware optimization, and multi-task guidance have been widely used for supervised person re-identification (ReID), but the implementation and effects of such methods in unsupervised ReID frameworks are non-trivial and unclear, respectively. Moreover, with increasing size and complexity of image- and video-based ReID datasets, manual or semi-automated annotation procedures for supervised ReID are becoming labor intensive and cost prohibitive, which is undesirable especially considering the likelihood of annotation errors increase with scale/complexity of data collections. Therefore, we propose a new iterative clustering framework that is insensitive to annotation errors and over-fitting ReID annotations (i.e., labels). Our proposed unsupervised framework incorporates (a) a novel multi-context group attention architecture that learns a holistic attention map from multiple local and global contexts, (b) an unsupervised clustering loss function that down-weights easily discriminative identities, and (c) a background diversity term that helps cluster persons across different cross-camera views without leveraging any identification or camera labels. We perform extensive analysis using the DukeMTMC-VideoReID and MARS video-based ReID datasets and the MSMT17 image-based ReID dataset. Our approach is shown to provide a new state-of-the-art performance for unsupervised ReID, reducing the rank-1 performance gap between supervised and unsupervised ReID to 1.1%, 12.1%, and 21.9% from 6.1%, 17.9%, and 22.6% for DukeMTMC, MARS, and MSMT17 datasets, respectively.

查看原文本刊更多论文

无监督人再识别的多情境分组注意

多上下文分析、注意机制、距离感知优化和多任务指导等最新进展已被广泛用于有监督的人再识别，但这些方法在无监督的人再识别框架中的实施和效果分别是不容忽视的和不明确的。此外，随着基于图像和视频的ReID数据集的规模和复杂性的增加，人工或半自动的ReID注释过程变得劳动密集型和成本过高，这是不可取的，特别是考虑到注释错误的可能性随着数据收集的规模/复杂性而增加。因此，我们提出了一种新的迭代聚类框架，该框架对标注错误和ReID标注(即标签)不敏感。我们提出的无监督框架包含(a)一种新的多上下文群体注意力架构，它从多个局部和全局上下文中学习整体注意力图，(b)一种无监督聚类损失函数，它可以轻松地降低鉴别身份的权重，以及(c)一个背景多样性术语，它可以帮助聚类不同跨相机视图的人，而无需利用任何识别或相机标签。我们使用DukeMTMC-VideoReID和MARS基于视频的ReID数据集以及MSMT17基于图像的ReID数据集进行了广泛的分析。我们的方法被证明为无监督ReID提供了新的最先进的性能，将有监督和无监督ReID之间的排名1的性能差距分别从DukeMTMC、MARS和MSMT17数据集的6.1%、17.9%和22.6%降低到1.1%、12.1%和21.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量