Object tracking based on temporal and spatial context information

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-03-12 DOI:10.1016/j.imavis.2025.105488

Yan Chen, Tao Lin, Jixiang Du, Hongbo Zhang

{"title":"Object tracking based on temporal and spatial context information","authors":"Yan Chen, Tao Lin, Jixiang Du, Hongbo Zhang","doi":"10.1016/j.imavis.2025.105488","DOIUrl":null,"url":null,"abstract":"<div><div>Currently, numerous advanced trackers improve stability by optimizing the target visual appearance models or by improving interactions between templates and search areas. Despite these advancements, appearance-based trackers still primarily depend on the visual information of targets without adequately integrating spatio-temporal context information, thus limiting their effectiveness in handling similar objects around the target. To address this challenge, a novel object tracking method, TSCTrack, which leverages spatio-temporal context information, has been introduced. TSCTrack overcomes the shortcomings of traditional center-cropping preprocessing techniques by introducing Global Spatial Position Embedding, effectively preserving spatial information and capturing motion data of targets. Additionally, TSCTrack incorporates a Spatial Relationship Aggregation module and a Temporal Relationship Aggregation module—the former captures static spatial context information per frame, while the latter integrates dynamic temporal context information. This sophisticated integration allows the Dynamic Tracking Prediction module to generate precise target coordinates effectively, greatly reducing the impact of target deformations and scale changes on tracking performance. Demonstrated across multiple public tracking datasets including LaSOT, TrackingNet, UAV123, GOT-10k, and OTB, TSCTrack showcases superior performance and validates its exceptional tracking capabilities in diverse scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105488"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000769","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Currently, numerous advanced trackers improve stability by optimizing the target visual appearance models or by improving interactions between templates and search areas. Despite these advancements, appearance-based trackers still primarily depend on the visual information of targets without adequately integrating spatio-temporal context information, thus limiting their effectiveness in handling similar objects around the target. To address this challenge, a novel object tracking method, TSCTrack, which leverages spatio-temporal context information, has been introduced. TSCTrack overcomes the shortcomings of traditional center-cropping preprocessing techniques by introducing Global Spatial Position Embedding, effectively preserving spatial information and capturing motion data of targets. Additionally, TSCTrack incorporates a Spatial Relationship Aggregation module and a Temporal Relationship Aggregation module—the former captures static spatial context information per frame, while the latter integrates dynamic temporal context information. This sophisticated integration allows the Dynamic Tracking Prediction module to generate precise target coordinates effectively, greatly reducing the impact of target deformations and scale changes on tracking performance. Demonstrated across multiple public tracking datasets including LaSOT, TrackingNet, UAV123, GOT-10k, and OTB, TSCTrack showcases superior performance and validates its exceptional tracking capabilities in diverse scenarios.

查看原文本刊更多论文

基于时空上下文信息的目标跟踪

目前，许多先进的跟踪器通过优化目标视觉外观模型或通过改进模板和搜索区域之间的交互来提高稳定性。尽管有这些进步，但基于外观的跟踪器仍然主要依赖于目标的视觉信息，而没有充分整合时空上下文信息，从而限制了它们在处理目标周围类似物体时的有效性。为了解决这一挑战，引入了一种新的目标跟踪方法，即利用时空上下文信息的TSCTrack。TSCTrack通过引入全局空间位置嵌入，克服了传统中心裁剪预处理技术的不足，有效地保留了目标的空间信息并捕获了目标的运动数据。此外，TSCTrack还集成了一个空间关系聚合模块和一个时间关系聚合模块——前者捕获每帧静态空间上下文信息，而后者集成动态时间上下文信息。这种复杂的集成允许动态跟踪预测模块有效地生成精确的目标坐标，大大减少目标变形和尺度变化对跟踪性能的影响。通过多个公共跟踪数据集（包括LaSOT、TrackingNet、UAV123、GOT-10k和OTB）的演示，TSCTrack展示了卓越的性能，并在不同场景中验证了其卓越的跟踪能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.