{"title":"AdaMoT: Adaptive Motion-Aware Transformer for Efficient Visual Tracking","authors":"Yongjun Wang;Xiaohui Hao","doi":"10.1109/LSP.2025.3553429","DOIUrl":null,"url":null,"abstract":"Visual object tracking utilizing adaptive computation presents challenges stemming from the complexities of modeling intricate motion patterns and achieving computational efficiency. While recent transformer-based trackers have shown promising results, they struggle to effectively capture varying motion dynamics and often waste computation on less informative regions, leading to degraded performance under fast motion and occlusion. In this letter, we present AdaMoT, an innovative motion-aware transformer framework featuring three lightweight modules that integrate adaptive attention and motion estimation: a Lightweight Adaptive Motion Estimation (LAME) module that guides transformer attention through motion pattern modeling, a Saliency-based Hard Attention Sampling (SHAS) module that reduces computation by 60% through focusing on motion-critical regions, and an Adaptive ViT Attention Head Adjustment (AVAHA) module that dynamically allocates attention heads based on motion complexity. Our framework uniquely integrates motion estimation with transformer attention through a shared feature space, achieving robust tracking with minimal overhead. Comprehensive testing indicate that AdaMoT attains superior performance on various demanding benchmarks (75.1% AO on GOT-10 k, 84.9% AUC on TrackingNet, 72.9% AUC on LaSOT) while maintaining real-time speed (32.1 FPS) with only 4% FLOPs increase.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1450-1454"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10935672/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Visual object tracking utilizing adaptive computation presents challenges stemming from the complexities of modeling intricate motion patterns and achieving computational efficiency. While recent transformer-based trackers have shown promising results, they struggle to effectively capture varying motion dynamics and often waste computation on less informative regions, leading to degraded performance under fast motion and occlusion. In this letter, we present AdaMoT, an innovative motion-aware transformer framework featuring three lightweight modules that integrate adaptive attention and motion estimation: a Lightweight Adaptive Motion Estimation (LAME) module that guides transformer attention through motion pattern modeling, a Saliency-based Hard Attention Sampling (SHAS) module that reduces computation by 60% through focusing on motion-critical regions, and an Adaptive ViT Attention Head Adjustment (AVAHA) module that dynamically allocates attention heads based on motion complexity. Our framework uniquely integrates motion estimation with transformer attention through a shared feature space, achieving robust tracking with minimal overhead. Comprehensive testing indicate that AdaMoT attains superior performance on various demanding benchmarks (75.1% AO on GOT-10 k, 84.9% AUC on TrackingNet, 72.9% AUC on LaSOT) while maintaining real-time speed (32.1 FPS) with only 4% FLOPs increase.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.