无监督跨分辨率人物再识别的图像-文本语义学习

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-05-09 DOI:10.1016/j.eswa.2025.128092

Fuqi Liu , Zhiqi Pang , Chunyu Wang

{"title":"无监督跨分辨率人物再识别的图像-文本语义学习","authors":"Fuqi Liu , Zhiqi Pang , Chunyu Wang","doi":"10.1016/j.eswa.2025.128092","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-resolution person re-identification (CR-ReID) focuses on matching person images of the same identity across different resolutions. Most existing CR-ReID methods rely on manually annotated identity labels for training. Although some researchers have proposed unsupervised CR-ReID (UCR-ReID) methods, the feature fusion techniques they rely on still require a large number of parameters and significant computational resources, limiting the widespread application of UCR-ReID technology. To address the aforementioned issues, we propose an image-text semantic learning (ITSL) method, which incorporates text semantics to enhance recognition performance. During the testing phase, ITSL requires only a single encoder to obtain resolution-invariant features. Specifically, ITSL first learns text features based on a visual-language model, and then utilizes the dual semantic matching module to match inter-resolution positive clusters in both the image and text modalities. During the optimization process, ITSL not only incorporates image semantic contrastive loss to facilitate cross-resolution alignment but also integrates text semantic contrastive loss to leverage text semantics for promoting resolution-invariance learning. Additionally, we design random region downsampling in ITSL, which further enhances the model’s robustness to resolution gaps through data augmentation. Experimental results on multiple cross-resolution datasets show that ITSL not only outperforms existing unsupervised methods while maintaining efficiency, but also approaches the performance of earlier supervised methods on certain datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"286 ","pages":"Article 128092"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image-text semantic learning for unsupervised cross-resolution person re-identification\",\"authors\":\"Fuqi Liu , Zhiqi Pang , Chunyu Wang\",\"doi\":\"10.1016/j.eswa.2025.128092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cross-resolution person re-identification (CR-ReID) focuses on matching person images of the same identity across different resolutions. Most existing CR-ReID methods rely on manually annotated identity labels for training. Although some researchers have proposed unsupervised CR-ReID (UCR-ReID) methods, the feature fusion techniques they rely on still require a large number of parameters and significant computational resources, limiting the widespread application of UCR-ReID technology. To address the aforementioned issues, we propose an image-text semantic learning (ITSL) method, which incorporates text semantics to enhance recognition performance. During the testing phase, ITSL requires only a single encoder to obtain resolution-invariant features. Specifically, ITSL first learns text features based on a visual-language model, and then utilizes the dual semantic matching module to match inter-resolution positive clusters in both the image and text modalities. During the optimization process, ITSL not only incorporates image semantic contrastive loss to facilitate cross-resolution alignment but also integrates text semantic contrastive loss to leverage text semantics for promoting resolution-invariance learning. Additionally, we design random region downsampling in ITSL, which further enhances the model’s robustness to resolution gaps through data augmentation. Experimental results on multiple cross-resolution datasets show that ITSL not only outperforms existing unsupervised methods while maintaining efficiency, but also approaches the performance of earlier supervised methods on certain datasets.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"286 \",\"pages\":\"Article 128092\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425017130\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425017130","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

跨分辨率人物再识别（Cross-resolution person reidentification, CR-ReID）侧重于在不同分辨率下匹配相同身份的人物图像。大多数现有的CR-ReID方法依赖于手动标注的身份标签进行训练。尽管一些研究人员提出了无监督CR-ReID （UCR-ReID）方法，但其所依赖的特征融合技术仍然需要大量的参数和大量的计算资源，限制了UCR-ReID技术的广泛应用。为了解决上述问题，我们提出了一种图像-文本语义学习（ITSL）方法，该方法结合文本语义来提高识别性能。在测试阶段，ITSL只需要一个编码器来获得分辨率不变的特征。具体而言，ITSL首先基于视觉语言模型学习文本特征，然后利用双语义匹配模块在图像和文本两种模式下匹配分辨率间的正聚类。在优化过程中，ITSL不仅集成了图像语义对比损失以促进跨分辨率对齐，还集成了文本语义对比损失以利用文本语义促进分辨率不变性学习。此外，我们在ITSL中设计了随机区域下采样，通过数据增强进一步增强了模型对分辨率间隙的鲁棒性。在多个交叉分辨率数据集上的实验结果表明，ITSL不仅在保持效率的同时优于现有的无监督方法，而且在某些数据集上接近早期有监督方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Image-text semantic learning for unsupervised cross-resolution person re-identification

Cross-resolution person re-identification (CR-ReID) focuses on matching person images of the same identity across different resolutions. Most existing CR-ReID methods rely on manually annotated identity labels for training. Although some researchers have proposed unsupervised CR-ReID (UCR-ReID) methods, the feature fusion techniques they rely on still require a large number of parameters and significant computational resources, limiting the widespread application of UCR-ReID technology. To address the aforementioned issues, we propose an image-text semantic learning (ITSL) method, which incorporates text semantics to enhance recognition performance. During the testing phase, ITSL requires only a single encoder to obtain resolution-invariant features. Specifically, ITSL first learns text features based on a visual-language model, and then utilizes the dual semantic matching module to match inter-resolution positive clusters in both the image and text modalities. During the optimization process, ITSL not only incorporates image semantic contrastive loss to facilitate cross-resolution alignment but also integrates text semantic contrastive loss to leverage text semantics for promoting resolution-invariance learning. Additionally, we design random region downsampling in ITSL, which further enhances the model’s robustness to resolution gaps through data augmentation. Experimental results on multiple cross-resolution datasets show that ITSL not only outperforms existing unsupervised methods while maintaining efficiency, but also approaches the performance of earlier supervised methods on certain datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.