Hong Zhang , Wanli Xing , Yifan Yang , Hanyang Liu , Ding Yuan
{"title":"TransSTC:变压器跟踪器满足有效的时空线索","authors":"Hong Zhang , Wanli Xing , Yifan Yang , Hanyang Liu , Ding Yuan","doi":"10.1016/j.patcog.2025.112303","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, researchers have started developing trackers using the powerful global modeling capabilities of transformer networks. However, existing transformer trackers usually model all template spatial cues indiscriminately and ignore temporal cues of target state changes. This distracts the tracker’s attention and gradually fails to understand the target’s latest state. Therefore, we propose a new tracker called TransSTC, which explores the effective spatial cues in the template and temporal cues during tracking to improve the tracker’s performance. Specifically, we design the target-aware focused coding network to emphasize the efficient spatial cues in the templates, alleviating the impact of spatial cues with low associations of targets in templates on the tracker’s localization accuracy. Additionally, we employ the multi-temporal template update structure that accurately captures variations in the target’s appearance. Within this structure, the collected samples are assessed for target appearance similarity and environmental interference, followed by a three-level sample selection process to ensure the accurate template update. Finally, we introduce the motion constraint framework to dynamically adjust the classification results based on the target’s historical motion trajectory. Extensive experimental results on seven tracking benchmarks demonstrate that TransSTC achieves competitive tracking performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112303"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TransSTC: transformer tracker meets efficient spatial-temporal cues\",\"authors\":\"Hong Zhang , Wanli Xing , Yifan Yang , Hanyang Liu , Ding Yuan\",\"doi\":\"10.1016/j.patcog.2025.112303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, researchers have started developing trackers using the powerful global modeling capabilities of transformer networks. However, existing transformer trackers usually model all template spatial cues indiscriminately and ignore temporal cues of target state changes. This distracts the tracker’s attention and gradually fails to understand the target’s latest state. Therefore, we propose a new tracker called TransSTC, which explores the effective spatial cues in the template and temporal cues during tracking to improve the tracker’s performance. Specifically, we design the target-aware focused coding network to emphasize the efficient spatial cues in the templates, alleviating the impact of spatial cues with low associations of targets in templates on the tracker’s localization accuracy. Additionally, we employ the multi-temporal template update structure that accurately captures variations in the target’s appearance. Within this structure, the collected samples are assessed for target appearance similarity and environmental interference, followed by a three-level sample selection process to ensure the accurate template update. Finally, we introduce the motion constraint framework to dynamically adjust the classification results based on the target’s historical motion trajectory. Extensive experimental results on seven tracking benchmarks demonstrate that TransSTC achieves competitive tracking performance.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112303\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325009641\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325009641","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Recently, researchers have started developing trackers using the powerful global modeling capabilities of transformer networks. However, existing transformer trackers usually model all template spatial cues indiscriminately and ignore temporal cues of target state changes. This distracts the tracker’s attention and gradually fails to understand the target’s latest state. Therefore, we propose a new tracker called TransSTC, which explores the effective spatial cues in the template and temporal cues during tracking to improve the tracker’s performance. Specifically, we design the target-aware focused coding network to emphasize the efficient spatial cues in the templates, alleviating the impact of spatial cues with low associations of targets in templates on the tracker’s localization accuracy. Additionally, we employ the multi-temporal template update structure that accurately captures variations in the target’s appearance. Within this structure, the collected samples are assessed for target appearance similarity and environmental interference, followed by a three-level sample selection process to ensure the accurate template update. Finally, we introduce the motion constraint framework to dynamically adjust the classification results based on the target’s historical motion trajectory. Extensive experimental results on seven tracking benchmarks demonstrate that TransSTC achieves competitive tracking performance.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.