Noise-robust re-identification with triple-consistency perception

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-07-28 DOI:10.1016/j.imavis.2024.105197

{"title":"Noise-robust re-identification with triple-consistency perception","authors":"","doi":"10.1016/j.imavis.2024.105197","DOIUrl":null,"url":null,"abstract":"<div><p>Traditional re-identification (ReID) methods heavily rely on clean and accurately annotated training data, rendering them susceptible to label noise in real-world scenarios. Although some noise-robust learning methods have been proposed and achieved promising recognition performance, however, most of these methods are designed for the image classification task and they are not suitable in ReID (engaging in the association and matching of objects rather than solely focusing on their identification). To address this problem, in this paper, we propose a Triple-consistency Perception based Noise-robust Re-identification Model (TcP-ReID), in which we make the model mine and focus more on the clean samples and reliable relationships among samples from different perspectives. Specifically, the self-consistency strategy guides the model to emphasize and prioritize clean samples, thereby preventing overfitting to noise labels during the initial stages of model training. Rather than focusing solely on individual samples, the context-consistency loss exploits similarities between samples in the feature space, promoting predictions for each sample to align with those of its nearest neighbors. Moreover, to further enforce the robustness of our model, a Jensen-Shannon divergence based cross-view consistency loss is introduced by encouraging the consistency of samples across different views. Extensive experiments demonstrate the superiority of the proposed TcP-ReID over the competing methods under instance-dependent noise and instance-independent noise. For instance, on the Market1501 dataset, our method achieves 85.8% in rank-1 accuracy and 56.3% in mAP score (5.6% and 8.7% improvements) under instance-independent noise with <em>noise ratio 50%</em>, and similarly 5.7% and 1.4% under instance-dependent label noise.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003020","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional re-identification (ReID) methods heavily rely on clean and accurately annotated training data, rendering them susceptible to label noise in real-world scenarios. Although some noise-robust learning methods have been proposed and achieved promising recognition performance, however, most of these methods are designed for the image classification task and they are not suitable in ReID (engaging in the association and matching of objects rather than solely focusing on their identification). To address this problem, in this paper, we propose a Triple-consistency Perception based Noise-robust Re-identification Model (TcP-ReID), in which we make the model mine and focus more on the clean samples and reliable relationships among samples from different perspectives. Specifically, the self-consistency strategy guides the model to emphasize and prioritize clean samples, thereby preventing overfitting to noise labels during the initial stages of model training. Rather than focusing solely on individual samples, the context-consistency loss exploits similarities between samples in the feature space, promoting predictions for each sample to align with those of its nearest neighbors. Moreover, to further enforce the robustness of our model, a Jensen-Shannon divergence based cross-view consistency loss is introduced by encouraging the consistency of samples across different views. Extensive experiments demonstrate the superiority of the proposed TcP-ReID over the competing methods under instance-dependent noise and instance-independent noise. For instance, on the Market1501 dataset, our method achieves 85.8% in rank-1 accuracy and 56.3% in mAP score (5.6% and 8.7% improvements) under instance-independent noise with noise ratio 50%, and similarly 5.7% and 1.4% under instance-dependent label noise.

查看原文本刊更多论文

利用三重一致性感知进行噪声抑制再识别

传统的再识别（ReID）方法在很大程度上依赖于干净准确的注释训练数据，因此在实际应用中很容易受到标签噪声的影响。虽然已经提出了一些抗噪声学习方法，并取得了可喜的识别性能，但这些方法大多是为图像分类任务而设计的，并不适用于 ReID（从事对象的关联和匹配，而不是仅仅关注对象的识别）。为了解决这个问题，本文提出了基于三重一致性感知的噪声稳健再识别模型（TcP-ReID），我们从不同角度对模型进行挖掘，更加关注干净的样本和样本之间的可靠关系。具体来说，自洽策略引导模型强调并优先考虑干净样本，从而防止在模型训练的初始阶段过度拟合噪声标签。上下文一致性损失并不是只关注单个样本，而是利用特征空间中样本之间的相似性，促进每个样本的预测与其近邻样本的预测保持一致。此外，为了进一步增强模型的鲁棒性，我们还引入了基于詹森-香农发散的跨视角一致性损失，鼓励不同视角的样本保持一致。大量实验证明，在与实例相关的噪声和与实例无关的噪声条件下，所提出的 TcP-ReID 方法优于其他竞争方法。例如，在 Market1501 数据集上，我们的方法在与实例无关的噪声下（噪声比为 50%）取得了 85.8% 的排名-1 准确率和 56.3% 的 mAP 分数（分别提高了 5.6% 和 8.7%），而在与实例无关的标签噪声下，我们的方法同样取得了 5.7% 和 1.4% 的排名-1 准确率和 mAP 分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.