Enhancing Visual Analysis in Person Re-Identification With Vision-Language Models.

IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang
{"title":"Enhancing Visual Analysis in Person Re-Identification With Vision-Language Models.","authors":"Wang Xia, Tianci Wang, Jiawei Li, Guodao Sun, Haidong Gao, Xu Tan, Ronghua Liang","doi":"10.1109/MCG.2025.3593227","DOIUrl":null,"url":null,"abstract":"<p><p>Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3593227","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.

用视觉语言模型增强人再识别中的视觉分析。
基于图像的人物再识别旨在匹配多个摄像机中的个体。尽管机器学习取得了进步,但它们在现实场景中的有效性仍然有限,通常让用户手动处理细粒度匹配。最近的工作已经探索了文本信息作为辅助线索,但现有的方法产生粗糙的描述,并不能有效地将它们集成到检索工作流中。为了解决这些问题,我们采用了一种基于特定领域知识的视觉语言模型,为行人图像生成详细的文本描述和关键词。然后,我们利用图像聚类和关键词共现构建语义布局,创建了一个结合视觉和文本信息的联合搜索空间。此外,我们还引入了一种动态螺旋词云算法来改善视觉呈现和增强语义关联。最后,我们进行案例研究、用户研究和专家反馈,展示我们系统的可用性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Computer Graphics and Applications
IEEE Computer Graphics and Applications 工程技术-计算机:软件工程
CiteScore
3.20
自引率
5.60%
发文量
160
审稿时长
>12 weeks
期刊介绍: IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信