无监督人再识别的空间级联聚类和加权记忆

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-03-03 DOI:10.1016/j.imavis.2025.105478

Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang

{"title":"无监督人再识别的空间级联聚类和加权记忆","authors":"Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang","doi":"10.1016/j.imavis.2025.105478","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105478"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatial cascaded clustering and weighted memory for unsupervised person re-identification\",\"authors\":\"Jiahao Hong, Jialong Zuo, Chuchu Han, Ruochen Zheng, Ming Tian, Changxin Gao, Nong Sang\",\"doi\":\"10.1016/j.imavis.2025.105478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"156 \",\"pages\":\"Article 105478\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000666\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000666","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

无监督人员再识别（re-ID）方法的最新进展通过利用细粒度的本地上下文（通常称为基于部件的方法）展示了高性能。然而，许多现有的基于部分的方法依赖于水平划分来获得局部上下文，导致各种人体姿势引起的不对齐问题。此外，零件特征中语义信息的不一致阻碍了度量学习的有效性，从而限制了基于零件的方法的潜力。这些挑战导致现有方法对零件特征的利用不足。为了解决这些问题，我们引入了空间级联聚类和加权记忆（SCWM）方法。SCWM旨在解析和对齐不同人体部位的更准确的局部上下文，同时允许内存模块平衡硬示例挖掘和噪声抑制。具体而言，我们首先分析了以往方法中前景遗漏和空间混淆的问题。然后，我们提出前景和空间校正，以提高人工解析结果的完整性和合理性。接下来，我们引入了一个加权内存，并使用了两种加权策略。这些策略解决了全局特征的硬样本挖掘问题，增强了局部特征的抗噪声性，从而更好地利用了全局和局部特征。在Market-1501， DukeMTMC-reID和MSMT17数据集上进行的大量实验验证了所提出方法优于许多最先进方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spatial cascaded clustering and weighted memory for unsupervised person re-identification

Recent advancements in unsupervised person re-identification (re-ID) methods have demonstrated high performance by leveraging fine-grained local context, often referred to as part-based methods. However, many existing part-based methods rely on horizontal division to obtain local contexts, leading to misalignment issues caused by various human poses. Moreover, misalignment of semantic information within part features hampers the effectiveness of metric learning, thereby limiting the potential of part-based methods. These challenges result in under-utilization of part features in existing approaches. To address these issues, we introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the issues of foreground omissions and spatial confusions in previous methods. We then propose foreground and space corrections to enhance the completeness and reasonableness of human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, enabling better utilization of both global and part features. Extensive experiments conducted on Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of the proposed method over numerous state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.