Propagating prior information with transformer for robust visual object tracking

IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yue Wu, Chengtao Cai, Chai Kiat Yeo
{"title":"Propagating prior information with transformer for robust visual object tracking","authors":"Yue Wu, Chengtao Cai, Chai Kiat Yeo","doi":"10.1007/s00530-024-01423-8","DOIUrl":null,"url":null,"abstract":"<p>In recent years, the domain of visual object tracking has witnessed considerable advancements with the advent of deep learning methodologies. Siamese-based trackers have been pivotal, establishing a new architecture with a weight-shared backbone. With the inclusion of the transformer, attention mechanism has been exploited to enhance the feature discriminability across successive frames. However, the limited adaptability of many existing trackers to the different tracking scenarios has led to inaccurate target localization. To effectively solve this issue, in this paper, we have integrated a siamese network with the transformer, where the former utilizes ResNet50 as the backbone network to extract the target features, while the latter consists of an encoder and a decoder, where the encoder can effectively utilize global contextual information to obtain the discriminative features. Simultaneously, we employ the decoder to propagate prior information related to the target, which enables the tracker to successfully locate the target in a variety of environments, enhancing the stability and robustness of the tracker. Extensive experiments on four major public datasets, OTB100, UAV123, GOT10k and LaSOText demonstrate the effectiveness of the proposed method. Its performance surpasses many state-of-the-art trackers. Additionally, the proposed tracker can achieve a tracking speed of 60 fps, meeting the requirements for real-time tracking.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"8 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01423-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, the domain of visual object tracking has witnessed considerable advancements with the advent of deep learning methodologies. Siamese-based trackers have been pivotal, establishing a new architecture with a weight-shared backbone. With the inclusion of the transformer, attention mechanism has been exploited to enhance the feature discriminability across successive frames. However, the limited adaptability of many existing trackers to the different tracking scenarios has led to inaccurate target localization. To effectively solve this issue, in this paper, we have integrated a siamese network with the transformer, where the former utilizes ResNet50 as the backbone network to extract the target features, while the latter consists of an encoder and a decoder, where the encoder can effectively utilize global contextual information to obtain the discriminative features. Simultaneously, we employ the decoder to propagate prior information related to the target, which enables the tracker to successfully locate the target in a variety of environments, enhancing the stability and robustness of the tracker. Extensive experiments on four major public datasets, OTB100, UAV123, GOT10k and LaSOText demonstrate the effectiveness of the proposed method. Its performance surpasses many state-of-the-art trackers. Additionally, the proposed tracker can achieve a tracking speed of 60 fps, meeting the requirements for real-time tracking.

Abstract Image

利用变换器传播先验信息,实现稳健的视觉物体跟踪
近年来,随着深度学习方法的出现,视觉物体跟踪领域取得了长足的进步。基于连体的跟踪器发挥了关键作用,建立了一种以权重共享为骨干的新架构。随着变压器的加入,注意力机制被用来提高连续帧的特征可辨别性。然而,现有的许多跟踪器对不同跟踪场景的适应性有限,导致目标定位不准确。为了有效解决这一问题,本文将连体网络与变压器整合在一起,前者利用 ResNet50 作为骨干网络来提取目标特征,后者由编码器和解码器组成,其中编码器可以有效利用全局上下文信息来获取判别特征。同时,我们还利用解码器传播与目标相关的先验信息,从而使跟踪器能够在各种环境中成功定位目标,增强跟踪器的稳定性和鲁棒性。在四个主要公共数据集 OTB100、UAV123、GOT10k 和 LaSOText 上进行的广泛实验证明了所提方法的有效性。其性能超过了许多最先进的跟踪器。此外,所提出的跟踪器可以达到 60 fps 的跟踪速度,满足了实时跟踪的要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Multimedia Systems
Multimedia Systems 工程技术-计算机:理论方法
CiteScore
5.40
自引率
7.70%
发文量
148
审稿时长
4.5 months
期刊介绍: This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信