Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Minjung Kim , MyeongAh Cho , Heansung Lee , Sangyoun Lee
{"title":"Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification","authors":"Minjung Kim ,&nbsp;MyeongAh Cho ,&nbsp;Heansung Lee ,&nbsp;Sangyoun Lee","doi":"10.1016/j.patcog.2025.111813","DOIUrl":null,"url":null,"abstract":"<div><div>Video-based person re-identification (ReID) aims to match an individual across multiple videos, thus addressing critical aspects of security applications of computer vision. While previous transformer-based approaches have used various means to enhance performance, the growing complexities in network design have posed challenges in meeting the practical requirements of intelligent surveillance systems. To improve network efficiency, we introduce a Feature-level Augmentation Vision Transformer (FAViT), which reinterprets the attributes of video ReID. We leverage the property of maintaining identity even when backgrounds change or multiple persons appear in video frames. First, we introduce Token Representation Learning to distinguish foreground from background. We also employ spatio-temporal feature-level augmentation, along with conducting Altered Background ID classification and Anomaly Frame Detection, to strengthen the representation capacity of the transformer. Extensive experiments validate the effectiveness of FAViT with the least computational overhead among transformer-based models across five benchmarks. We substantiate our model’s generalization ability through analyses.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"168 ","pages":"Article 111813"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500473X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video-based person re-identification (ReID) aims to match an individual across multiple videos, thus addressing critical aspects of security applications of computer vision. While previous transformer-based approaches have used various means to enhance performance, the growing complexities in network design have posed challenges in meeting the practical requirements of intelligent surveillance systems. To improve network efficiency, we introduce a Feature-level Augmentation Vision Transformer (FAViT), which reinterprets the attributes of video ReID. We leverage the property of maintaining identity even when backgrounds change or multiple persons appear in video frames. First, we introduce Token Representation Learning to distinguish foreground from background. We also employ spatio-temporal feature-level augmentation, along with conducting Altered Background ID classification and Anomaly Frame Detection, to strengthen the representation capacity of the transformer. Extensive experiments validate the effectiveness of FAViT with the least computational overhead among transformer-based models across five benchmarks. We substantiate our model’s generalization ability through analyses.
基于视频的人再识别的时空特征级增强视觉转换器
基于视频的人再识别(ReID)旨在匹配多个视频中的个人,从而解决计算机视觉安全应用的关键方面。虽然以前基于变压器的方法使用了各种方法来提高性能,但网络设计的复杂性日益增加,对满足智能监控系统的实际要求提出了挑战。为了提高网络效率,我们引入了一种特征级增强视觉变压器(FAViT),它重新解释了视频ReID的属性。我们利用了即使背景发生变化或视频帧中出现多人时也能保持身份的特性。首先,我们引入Token表示学习来区分前景和背景。我们还采用时空特征级增强,以及进行改变背景ID分类和异常帧检测,以加强变压器的表示能力。大量的实验验证了fait的有效性,在基于变压器的模型中计算开销最小,跨越五个基准。通过分析验证了模型的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信