Jian Tao , Sixian Chan , Zhenchao Shi , Cong Bai , Shengyong Chen
{"title":"FocTrack: Focus attention for visual tracking","authors":"Jian Tao , Sixian Chan , Zhenchao Shi , Cong Bai , Shengyong Chen","doi":"10.1016/j.patcog.2024.111128","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer trackers have achieved widespread success based on their attention mechanism. The vanilla attention mechanism focuses on modeling the long-range dependencies between tokens to gain a global perspective. However, in human tracking behavior, the line of sight first skims apparent regions and then focuses on the differences between similar regions. To explore this issue, we build a powerful online tacker with focus attention, named FocTrack. Firstly, we design a focus attention module, which adopts the iterative binary clustering function (IBCF) before self-attention to simulate human behavior. Specifically, for a given cluster, other clusters are treated as apparent tokens that are skimmed during the clustering process, while the subsequent self-attention performs focused discriminative learning on the target cluster. Moreover, we propose a local template update strategy (LTUS) to probe into the effective temporal information for visual object tracking. In the testing, LTUS only replaces outdated local templates to ensure overall reliability and holds a low computational burden. Finally, extensive experiments show that our proposed FocTrack achieves state-of-the-art performance in several benchmarks.In particular, FocTrack achieves 71.5% AUC on the LaSOT, 84.7% AUC on the TrackingNet, and a running speed of around 36 FPS, outperforming the popular approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111128"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008793","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer trackers have achieved widespread success based on their attention mechanism. The vanilla attention mechanism focuses on modeling the long-range dependencies between tokens to gain a global perspective. However, in human tracking behavior, the line of sight first skims apparent regions and then focuses on the differences between similar regions. To explore this issue, we build a powerful online tacker with focus attention, named FocTrack. Firstly, we design a focus attention module, which adopts the iterative binary clustering function (IBCF) before self-attention to simulate human behavior. Specifically, for a given cluster, other clusters are treated as apparent tokens that are skimmed during the clustering process, while the subsequent self-attention performs focused discriminative learning on the target cluster. Moreover, we propose a local template update strategy (LTUS) to probe into the effective temporal information for visual object tracking. In the testing, LTUS only replaces outdated local templates to ensure overall reliability and holds a low computational burden. Finally, extensive experiments show that our proposed FocTrack achieves state-of-the-art performance in several benchmarks.In particular, FocTrack achieves 71.5% AUC on the LaSOT, 84.7% AUC on the TrackingNet, and a running speed of around 36 FPS, outperforming the popular approaches.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.