Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang
{"title":"Global–local feature-mixed network with template update for visual tracking","authors":"Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang","doi":"10.1016/j.patrec.2024.11.034","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global–local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 111-116"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524003465","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global–local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.