Spatial-temporal graph-guided global attention network for video-based person re-identification

IF 2.4 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiaobao Li, Wen Wang, Qingyong Li, Jiang Zhang
{"title":"Spatial-temporal graph-guided global attention network for video-based person re-identification","authors":"Xiaobao Li, Wen Wang, Qingyong Li, Jiang Zhang","doi":"10.1007/s00138-023-01489-w","DOIUrl":null,"url":null,"abstract":"<p>Global attention learning has been extensively applied in video-based person re-identification due to its superiority in capturing contextual correlations. However, existing global attention learning methods usually adopt the conventional neural network to model non-Euclidean contextual correlations, resulting in a limited representation ability. Inspired by the graph-structure property of the contextual correlations, we propose a spatial-temporal graph-guided global attention network (STG<span>\\(^3\\)</span>A) for video-based person re-identification. STG<span>\\(^3\\)</span>A comprises two graph-guided attention modules to capture the spatial contexts within a frame and temporal contexts across all frames in a sequence for global attention learning. Furthermore, the graphs from both modules are encoded as graph representations, which combine with weighted representations to grasp the spatial-temporal contextual information adequately for video feature learning. To reduce the effect of noisy graph nodes and learn robust graph representations, a graph node attention is developed to trade-off the importance of each graph node, leading to noise-tolerant graph models. Finally, we design a graph-guided fusion scheme to integrate the representations output by these two attentive modules for a more compact video feature. Extensive experiments on MARS and DukeMTMCVideoReID datasets demonstrate the superior performance of the STG<span>\\(^3\\)</span>A.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"55 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-023-01489-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Global attention learning has been extensively applied in video-based person re-identification due to its superiority in capturing contextual correlations. However, existing global attention learning methods usually adopt the conventional neural network to model non-Euclidean contextual correlations, resulting in a limited representation ability. Inspired by the graph-structure property of the contextual correlations, we propose a spatial-temporal graph-guided global attention network (STG\(^3\)A) for video-based person re-identification. STG\(^3\)A comprises two graph-guided attention modules to capture the spatial contexts within a frame and temporal contexts across all frames in a sequence for global attention learning. Furthermore, the graphs from both modules are encoded as graph representations, which combine with weighted representations to grasp the spatial-temporal contextual information adequately for video feature learning. To reduce the effect of noisy graph nodes and learn robust graph representations, a graph node attention is developed to trade-off the importance of each graph node, leading to noise-tolerant graph models. Finally, we design a graph-guided fusion scheme to integrate the representations output by these two attentive modules for a more compact video feature. Extensive experiments on MARS and DukeMTMCVideoReID datasets demonstrate the superior performance of the STG\(^3\)A.

Abstract Image

基于视频的人物再识别的时空图引导全局注意力网络
全局注意学习由于其在捕捉语境关联方面的优势,在基于视频的人物再识别中得到了广泛的应用。然而,现有的全局注意学习方法通常采用传统的神经网络对非欧几里得上下文关联进行建模,导致表征能力有限。受上下文相关性的图结构特性的启发,我们提出了一个时空图引导的全局注意网络(STG \(^3\) a),用于基于视频的人物再识别。STG \(^3\) A包括两个图形引导的注意力模块,用于捕获框架内的空间上下文和所有框架中的时间上下文,以进行全局注意力学习。此外,将两个模块的图形编码为图形表示,并与加权表示相结合,充分掌握视频特征学习的时空上下文信息。为了减少噪声图节点的影响并学习鲁棒图表示,开发了图节点注意力来权衡每个图节点的重要性,从而产生耐噪声图模型。最后,我们设计了一个图形引导的融合方案来整合这两个关注模块输出的表示,以获得更紧凑的视频特征。在MARS和DukeMTMCVideoReID数据集上的大量实验证明了STG \(^3\) A的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Vision and Applications
Machine Vision and Applications 工程技术-工程:电子与电气
CiteScore
6.30
自引率
3.00%
发文量
84
审稿时长
8.7 months
期刊介绍: Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal. Particular emphasis is placed on engineering and technology aspects of image processing and computer vision. The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信