AdaMoT: Adaptive Motion-Aware Transformer for Efficient Visual Tracking

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Yongjun Wang;Xiaohui Hao
{"title":"AdaMoT: Adaptive Motion-Aware Transformer for Efficient Visual Tracking","authors":"Yongjun Wang;Xiaohui Hao","doi":"10.1109/LSP.2025.3553429","DOIUrl":null,"url":null,"abstract":"Visual object tracking utilizing adaptive computation presents challenges stemming from the complexities of modeling intricate motion patterns and achieving computational efficiency. While recent transformer-based trackers have shown promising results, they struggle to effectively capture varying motion dynamics and often waste computation on less informative regions, leading to degraded performance under fast motion and occlusion. In this letter, we present AdaMoT, an innovative motion-aware transformer framework featuring three lightweight modules that integrate adaptive attention and motion estimation: a Lightweight Adaptive Motion Estimation (LAME) module that guides transformer attention through motion pattern modeling, a Saliency-based Hard Attention Sampling (SHAS) module that reduces computation by 60% through focusing on motion-critical regions, and an Adaptive ViT Attention Head Adjustment (AVAHA) module that dynamically allocates attention heads based on motion complexity. Our framework uniquely integrates motion estimation with transformer attention through a shared feature space, achieving robust tracking with minimal overhead. Comprehensive testing indicate that AdaMoT attains superior performance on various demanding benchmarks (75.1% AO on GOT-10 k, 84.9% AUC on TrackingNet, 72.9% AUC on LaSOT) while maintaining real-time speed (32.1 FPS) with only 4% FLOPs increase.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1450-1454"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10935672/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Visual object tracking utilizing adaptive computation presents challenges stemming from the complexities of modeling intricate motion patterns and achieving computational efficiency. While recent transformer-based trackers have shown promising results, they struggle to effectively capture varying motion dynamics and often waste computation on less informative regions, leading to degraded performance under fast motion and occlusion. In this letter, we present AdaMoT, an innovative motion-aware transformer framework featuring three lightweight modules that integrate adaptive attention and motion estimation: a Lightweight Adaptive Motion Estimation (LAME) module that guides transformer attention through motion pattern modeling, a Saliency-based Hard Attention Sampling (SHAS) module that reduces computation by 60% through focusing on motion-critical regions, and an Adaptive ViT Attention Head Adjustment (AVAHA) module that dynamically allocates attention heads based on motion complexity. Our framework uniquely integrates motion estimation with transformer attention through a shared feature space, achieving robust tracking with minimal overhead. Comprehensive testing indicate that AdaMoT attains superior performance on various demanding benchmarks (75.1% AO on GOT-10 k, 84.9% AUC on TrackingNet, 72.9% AUC on LaSOT) while maintaining real-time speed (32.1 FPS) with only 4% FLOPs increase.
AdaMoT:用于高效视觉跟踪的自适应运动感知变压器
利用自适应计算的视觉目标跟踪由于建模复杂的运动模式和实现计算效率的复杂性而面临挑战。虽然最近基于变压器的跟踪器已经显示出有希望的结果,但它们很难有效地捕获变化的运动动态,并且经常在信息较少的区域浪费计算,导致快速运动和遮挡下的性能下降。在这封信中,我们介绍了AdaMoT,一种创新的运动感知变压器框架,具有三个轻量级模块,集成了自适应注意和运动估计:一个轻量级自适应运动估计(LAME)模块,通过运动模式建模引导变压器的注意力;一个基于显著性的硬注意采样(SHAS)模块,通过关注运动关键区域减少60%的计算;一个自适应ViT注意头调整(AVAHA)模块,根据运动复杂性动态分配注意头。我们的框架通过共享特征空间独特地将运动估计与变压器注意力集成在一起,以最小的开销实现鲁棒跟踪。综合测试表明,AdaMoT在各种苛刻的基准测试(GOT-10 k上的AO值为75.1%,TrackingNet上的AUC为84.9%,LaSOT上的AUC为72.9%)上取得了优异的性能,同时保持了实时速度(32.1 FPS),仅增加了4%的FLOPs。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信