Enhancing the Two-Stream Framework for Efficient Visual Tracking

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-20 DOI:10.1109/TIP.2025.3598934

Chengao Zong;Xin Chen;Jie Zhao;Yang Liu;Huchuan Lu;Dong Wang

{"title":"Enhancing the Two-Stream Framework for Efficient Visual Tracking","authors":"Chengao Zong;Xin Chen;Jie Zhao;Yang Liu;Huchuan Lu;Dong Wang","doi":"10.1109/TIP.2025.3598934","DOIUrl":null,"url":null,"abstract":"Practical deployments, especially on resource-limited edge devices, necessitate high speed for visual object trackers. To meet this demand, we introduce a new efficient tracker with a Two-Stream architecture, named ToS. While the recent one-stream tracking framework, employing a unified backbone for simultaneous processing of both the template and search region, has demonstrated exceptional efficacy, we find the conventional two-stream tracking framework, which employs two separate backbones for the template and search region, offers inherent advantages. The two-stream tracking framework is more compatible with advanced lightweight backbones and can efficiently utilize benefits from large templates. We demonstrate that the two-stream setup can exceed the one-stream tracking model in both speed and accuracy through strategic designs. Our methodology rejuvenates the two-stream tracking paradigm with lightweight pre-trained backbones and the proposed three efficient strategies: 1) A feature-aggregation module that improves the representation capability of the backbone, 2) A channel-wise approach for feature fusion, presenting a more effective and lighter alternative to spatial concatenation techniques, and 3) An expanded template strategy to boost tracking accuracy with negligible additional computational cost. Extensive evaluations across multiple tracking benchmarks demonstrate that the proposed method sets a new state-of-the-art performance in efficient visual tracking.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5500-5512"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11131531/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Practical deployments, especially on resource-limited edge devices, necessitate high speed for visual object trackers. To meet this demand, we introduce a new efficient tracker with a Two-Stream architecture, named ToS. While the recent one-stream tracking framework, employing a unified backbone for simultaneous processing of both the template and search region, has demonstrated exceptional efficacy, we find the conventional two-stream tracking framework, which employs two separate backbones for the template and search region, offers inherent advantages. The two-stream tracking framework is more compatible with advanced lightweight backbones and can efficiently utilize benefits from large templates. We demonstrate that the two-stream setup can exceed the one-stream tracking model in both speed and accuracy through strategic designs. Our methodology rejuvenates the two-stream tracking paradigm with lightweight pre-trained backbones and the proposed three efficient strategies: 1) A feature-aggregation module that improves the representation capability of the backbone, 2) A channel-wise approach for feature fusion, presenting a more effective and lighter alternative to spatial concatenation techniques, and 3) An expanded template strategy to boost tracking accuracy with negligible additional computational cost. Extensive evaluations across multiple tracking benchmarks demonstrate that the proposed method sets a new state-of-the-art performance in efficient visual tracking.

查看原文本刊更多论文

增强两流框架的高效视觉跟踪

实际部署，特别是在资源有限的边缘设备上，需要高速度的视觉对象跟踪器。为了满足这一需求，我们引入了一种新的高效跟踪器，它具有双流架构，名为ToS。而最近的单流跟踪框架，采用统一的骨干同时处理模板和搜索区域，已经证明了卓越的效率，我们发现传统的双流跟踪框架，采用两个独立的骨干模板和搜索区域，提供了固有的优势。双流跟踪框架与先进的轻量级骨干网更加兼容，并且可以有效地利用大型模板的优势。通过策略设计，我们证明了双流设置在速度和精度上都可以超过单流跟踪模型。我们的方法通过轻量级的预训练主干恢复了两流跟踪范式，并提出了三种有效的策略：1)一个特征聚合模块，提高了主干的表示能力；2)一种基于通道的特征融合方法，提供了一种更有效和更轻的空间连接技术替代方案；3)一种扩展模板策略，以微不足道的额外计算成本提高跟踪精度。对多个跟踪基准的广泛评估表明，所提出的方法在有效的视觉跟踪方面具有最新的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量