变压器跟踪的分割令牌融合与剪枝策略

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2025.105431

Chi Zhang, Yun Gao, Tao Meng, Tao Wang

{"title":"变压器跟踪的分割令牌融合与剪枝策略","authors":"Chi Zhang, Yun Gao, Tao Meng, Tao Wang","doi":"10.1016/j.imavis.2025.105431","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105431"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Partitioned token fusion and pruning strategy for transformer tracking\",\"authors\":\"Chi Zhang, Yun Gao, Tao Meng, Tao Wang\",\"doi\":\"10.1016/j.imavis.2025.105431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105431\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000198\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000198","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于变压器的跟踪算法由于具有强大的全局信息捕获能力，在目标跟踪领域表现出了突出的性能。然而，由于搜索区域中背景信息的冗余，导致搜索过程中存在干扰，计算量大。为了解决这个问题，我们设计了一种用于单流变压器跟踪器的分区令牌融合和修剪策略。该策略能更好地平衡信息保留和干扰抑制，在加速推理的同时提高跟踪鲁棒性。具体来说，我们根据搜索标记与对象模板的相关性将它们划分为高相关性、中等相关性和低相关性。将中等相关部分的特征信息融合到高相关部分。低相关性令牌将被直接丢弃。通过差异化分割令牌融合和剪枝策略，不仅减少了输入网络中的令牌数量，从而降低了变压器高昂的计算成本，而且在降低伴随背景噪声信息权重的同时，保留了中等相关特征的有用信息，提高了跟踪的鲁棒性。所提出的策略已经在几个具有挑战性的公共基准中进行了全面的实验评估，结果表明，与目前最先进的跟踪方法相比，我们的方法实现了出色的整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Partitioned token fusion and pruning strategy for transformer tracking

查看原文本刊更多论文

Partitioned token fusion and pruning strategy for transformer tracking

Transformer-based tracking algorithms have shown outstanding performance in the field of object tracking due to their powerful global information capture capability. However, the redundant background information in the search region results in interference and high computational complexity in searching for the tracked object. To address this problem, we design a partitioned token fusion and pruning strategy for one-stream transformer trackers. The strategy can achieve a better balance between information retention and interference reduction, and it can improve tracking robustness while accelerating inference. Specifically, we partition search tokens into high-correlation, medium-correlation, and low-correlation based on their relevance to the object template. The feature information in the medium-correlation part is fused into the high-correlation part. Low-correlation tokens are directly discarded. Through the differentiated partitioned token fusion and pruning strategy, we not only reduce the number of tokens in the input network, thus reducing the high computational cost of the transformer, but also improve the robustness of tracking by retaining the useful information of the medium-relevant features while reducing the weight of the accompanying background noise information. The proposed strategy has been comprehensively evaluated experimentally in several challenging public benchmarks, and the results show that our approach achieves excellent overall performance compared with current state-of-the-art tracking methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.