Global–local feature-mixed network with template update for visual tracking

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-02-01 DOI:10.1016/j.patrec.2024.11.034

Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang

{"title":"Global–local feature-mixed network with template update for visual tracking","authors":"Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang","doi":"10.1016/j.patrec.2024.11.034","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global–local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 111-116"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524003465","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global–local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.