FSTrack: Visual tracking with feature fusion and adaptive selection

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-10-01 DOI:10.1016/j.eswa.2025.129895

Jian Shi , Yang Yu , Bin Hui , Junze Shi , Haibo Luo

{"title":"FSTrack: Visual tracking with feature fusion and adaptive selection","authors":"Jian Shi , Yang Yu , Bin Hui , Junze Shi , Haibo Luo","doi":"10.1016/j.eswa.2025.129895","DOIUrl":null,"url":null,"abstract":"<div><div>Visual object tracking represents a critical research domain within computer vision, with significant applications spanning security surveillance, autonomous navigation, and other fields. Throughout the tracking process, distractors and target appearance variations frequently arise, rendering sole reliance on initial templates unreliable. Therefore, the effective integration of spatiotemporal information and search region features plays a crucial role in achieving robust long-term single-object tracking. However, most existing methods indiscriminately incorporate all historical features as spatiotemporal context, potentially introducing irrelevant or redundant information that undermines tracking reliability. To address this limitation while more effectively exploiting backbone features, we propose FSTrack, which leverages feature fusion to enhance search features and adaptively selects features to strengthen spatiotemporal features. First, we integrate multi-level backbone features through feature fusion and enhance feature resolution, thereby fully exploiting the multi-scale features of the backbone networks. Second, we introduce an adaptive feature selection mechanism that dynamically identifies and emphasizes discriminative historical features, enhancing the robustness of spatiotemporal modeling under diverse tracking scenarios. Third, we propose a globally contextual prediction head that overcomes the limitation of the limited receptive field inherent in conventional CNN-based heads and further improving the overall performance. Extensive experiments demonstrate the superiority of FSTrack. On mainstream benchmark datasets such as GOT-10k, TrackingNet, and LaSOT, our approach outperforms mainstream models using both the same and higher resolution inputs in terms of speed and accuracy, achieving state-of-the-art results on tracking benchmarks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129895"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425035109","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Visual object tracking represents a critical research domain within computer vision, with significant applications spanning security surveillance, autonomous navigation, and other fields. Throughout the tracking process, distractors and target appearance variations frequently arise, rendering sole reliance on initial templates unreliable. Therefore, the effective integration of spatiotemporal information and search region features plays a crucial role in achieving robust long-term single-object tracking. However, most existing methods indiscriminately incorporate all historical features as spatiotemporal context, potentially introducing irrelevant or redundant information that undermines tracking reliability. To address this limitation while more effectively exploiting backbone features, we propose FSTrack, which leverages feature fusion to enhance search features and adaptively selects features to strengthen spatiotemporal features. First, we integrate multi-level backbone features through feature fusion and enhance feature resolution, thereby fully exploiting the multi-scale features of the backbone networks. Second, we introduce an adaptive feature selection mechanism that dynamically identifies and emphasizes discriminative historical features, enhancing the robustness of spatiotemporal modeling under diverse tracking scenarios. Third, we propose a globally contextual prediction head that overcomes the limitation of the limited receptive field inherent in conventional CNN-based heads and further improving the overall performance. Extensive experiments demonstrate the superiority of FSTrack. On mainstream benchmark datasets such as GOT-10k, TrackingNet, and LaSOT, our approach outperforms mainstream models using both the same and higher resolution inputs in terms of speed and accuracy, achieving state-of-the-art results on tracking benchmarks.

查看原文本刊更多论文

FSTrack：基于特征融合和自适应选择的视觉跟踪

视觉目标跟踪是计算机视觉领域的一个重要研究领域，在安全监控、自主导航等领域有着重要的应用。在整个跟踪过程中，干扰物和目标的外观变化经常出现，使得仅仅依赖初始模板是不可靠的。因此，有效地将时空信息与搜索区域特征相结合，是实现单目标长期鲁棒跟踪的关键。然而，大多数现有方法不加选择地将所有历史特征作为时空背景，可能引入不相关或冗余的信息，从而破坏跟踪的可靠性。为了解决这一限制，同时更有效地利用骨干特征，我们提出了FSTrack，利用特征融合来增强搜索特征，并自适应选择特征来增强时空特征。首先，通过特征融合对多尺度骨干网特征进行融合，提高特征分辨率，充分利用骨干网的多尺度特征；其次，引入自适应特征选择机制，动态识别和强调判别性历史特征，增强了不同跟踪场景下时空建模的鲁棒性；第三，我们提出了一种全局上下文预测头，克服了传统基于cnn的头固有的有限感受野的局限性，进一步提高了整体性能。大量的实验证明了FSTrack的优越性。在主流基准数据集（如GOT-10k、TrackingNet和LaSOT）上，我们的方法在速度和准确性方面都优于使用相同和更高分辨率输入的主流模型，在跟踪基准上取得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.