Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-05-24 DOI:10.1016/j.patcog.2025.111813

Minjung Kim , MyeongAh Cho , Heansung Lee , Sangyoun Lee

{"title":"Spatio-temporal Feature-level Augmentation Vision Transformer for video-based person re-identification","authors":"Minjung Kim , MyeongAh Cho , Heansung Lee , Sangyoun Lee","doi":"10.1016/j.patcog.2025.111813","DOIUrl":null,"url":null,"abstract":"<div><div>Video-based person re-identification (ReID) aims to match an individual across multiple videos, thus addressing critical aspects of security applications of computer vision. While previous transformer-based approaches have used various means to enhance performance, the growing complexities in network design have posed challenges in meeting the practical requirements of intelligent surveillance systems. To improve network efficiency, we introduce a Feature-level Augmentation Vision Transformer (FAViT), which reinterprets the attributes of video ReID. We leverage the property of maintaining identity even when backgrounds change or multiple persons appear in video frames. First, we introduce Token Representation Learning to distinguish foreground from background. We also employ spatio-temporal feature-level augmentation, along with conducting Altered Background ID classification and Anomaly Frame Detection, to strengthen the representation capacity of the transformer. Extensive experiments validate the effectiveness of FAViT with the least computational overhead among transformer-based models across five benchmarks. We substantiate our model’s generalization ability through analyses.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"168 ","pages":"Article 111813"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500473X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video-based person re-identification (ReID) aims to match an individual across multiple videos, thus addressing critical aspects of security applications of computer vision. While previous transformer-based approaches have used various means to enhance performance, the growing complexities in network design have posed challenges in meeting the practical requirements of intelligent surveillance systems. To improve network efficiency, we introduce a Feature-level Augmentation Vision Transformer (FAViT), which reinterprets the attributes of video ReID. We leverage the property of maintaining identity even when backgrounds change or multiple persons appear in video frames. First, we introduce Token Representation Learning to distinguish foreground from background. We also employ spatio-temporal feature-level augmentation, along with conducting Altered Background ID classification and Anomaly Frame Detection, to strengthen the representation capacity of the transformer. Extensive experiments validate the effectiveness of FAViT with the least computational overhead among transformer-based models across five benchmarks. We substantiate our model’s generalization ability through analyses.

查看原文本刊更多论文

基于视频的人再识别的时空特征级增强视觉转换器

基于视频的人再识别（ReID）旨在匹配多个视频中的个人，从而解决计算机视觉安全应用的关键方面。虽然以前基于变压器的方法使用了各种方法来提高性能，但网络设计的复杂性日益增加，对满足智能监控系统的实际要求提出了挑战。为了提高网络效率，我们引入了一种特征级增强视觉变压器（FAViT），它重新解释了视频ReID的属性。我们利用了即使背景发生变化或视频帧中出现多人时也能保持身份的特性。首先，我们引入Token表示学习来区分前景和背景。我们还采用时空特征级增强，以及进行改变背景ID分类和异常帧检测，以加强变压器的表示能力。大量的实验验证了fait的有效性，在基于变压器的模型中计算开销最小，跨越五个基准。通过分析验证了模型的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.