{"title":"变压器跟踪的分割令牌融合与剪枝策略","authors":"Chi Zhang, Yun Gao, Tao Meng, Tao Wang","doi":"10.1016/j.imavis.2025.105431","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105431"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Partitioned token fusion and pruning strategy for transformer tracking\",\"authors\":\"Chi Zhang, Yun Gao, Tao Meng, Tao Wang\",\"doi\":\"10.1016/j.imavis.2025.105431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105431\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000198\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000198","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Partitioned token fusion and pruning strategy for transformer tracking
Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.