Chengao Zong;Xin Chen;Jie Zhao;Yang Liu;Huchuan Lu;Dong Wang
{"title":"Enhancing the Two-Stream Framework for Efficient Visual Tracking","authors":"Chengao Zong;Xin Chen;Jie Zhao;Yang Liu;Huchuan Lu;Dong Wang","doi":"10.1109/TIP.2025.3598934","DOIUrl":null,"url":null,"abstract":"Practical deployments, especially on resource-limited edge devices, necessitate high speed for visual object trackers. To meet this demand, we introduce a new efficient tracker with a Two-Stream architecture, named ToS. While the recent one-stream tracking framework, employing a unified backbone for simultaneous processing of both the template and search region, has demonstrated exceptional efficacy, we find the conventional two-stream tracking framework, which employs two separate backbones for the template and search region, offers inherent advantages. The two-stream tracking framework is more compatible with advanced lightweight backbones and can efficiently utilize benefits from large templates. We demonstrate that the two-stream setup can exceed the one-stream tracking model in both speed and accuracy through strategic designs. Our methodology rejuvenates the two-stream tracking paradigm with lightweight pre-trained backbones and the proposed three efficient strategies: 1) A feature-aggregation module that improves the representation capability of the backbone, 2) A channel-wise approach for feature fusion, presenting a more effective and lighter alternative to spatial concatenation techniques, and 3) An expanded template strategy to boost tracking accuracy with negligible additional computational cost. Extensive evaluations across multiple tracking benchmarks demonstrate that the proposed method sets a new state-of-the-art performance in efficient visual tracking.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5500-5512"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11131531/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Practical deployments, especially on resource-limited edge devices, necessitate high speed for visual object trackers. To meet this demand, we introduce a new efficient tracker with a Two-Stream architecture, named ToS. While the recent one-stream tracking framework, employing a unified backbone for simultaneous processing of both the template and search region, has demonstrated exceptional efficacy, we find the conventional two-stream tracking framework, which employs two separate backbones for the template and search region, offers inherent advantages. The two-stream tracking framework is more compatible with advanced lightweight backbones and can efficiently utilize benefits from large templates. We demonstrate that the two-stream setup can exceed the one-stream tracking model in both speed and accuracy through strategic designs. Our methodology rejuvenates the two-stream tracking paradigm with lightweight pre-trained backbones and the proposed three efficient strategies: 1) A feature-aggregation module that improves the representation capability of the backbone, 2) A channel-wise approach for feature fusion, presenting a more effective and lighter alternative to spatial concatenation techniques, and 3) An expanded template strategy to boost tracking accuracy with negligible additional computational cost. Extensive evaluations across multiple tracking benchmarks demonstrate that the proposed method sets a new state-of-the-art performance in efficient visual tracking.