用视觉语言模型增强人再识别中的视觉分析。

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Computer Graphics and Applications Pub Date : 2025-07-28 DOI:10.1109/MCG.2025.3593227

Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang

{"title":"用视觉语言模型增强人再识别中的视觉分析。","authors":"Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang","doi":"10.1109/MCG.2025.3593227","DOIUrl":null,"url":null,"abstract":"Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Visual Analysis in Person Re-Identification With Vision-Language Models.\",\"authors\":\"Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang\",\"doi\":\"10.1109/MCG.2025.3593227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3593227\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3593227","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

基于图像的人物再识别旨在匹配多个摄像机中的个体。尽管机器学习取得了进步，但它们在现实场景中的有效性仍然有限，通常让用户手动处理细粒度匹配。最近的工作已经探索了文本信息作为辅助线索，但现有的方法产生粗糙的描述，并不能有效地将它们集成到检索工作流中。为了解决这些问题，我们采用了一种基于特定领域知识的视觉语言模型，为行人图像生成详细的文本描述和关键词。然后，我们利用图像聚类和关键词共现构建语义布局，创建了一个结合视觉和文本信息的联合搜索空间。此外，我们还引入了一种动态螺旋词云算法来改善视觉呈现和增强语义关联。最后，我们进行案例研究、用户研究和专家反馈，展示我们系统的可用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing Visual Analysis in Person Re-Identification With Vision-Language Models.

Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Graphics and Applications 工程技术-计算机：软件工程

CiteScore

3.20

自引率

5.60%

发文量

160

审稿时长

>12 weeks

期刊介绍： IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.