Spatial feature embedding for robust visual object tracking

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2023-12-20 DOI:10.1049/cvi2.12263

Kang Liu, Long Liu, Shangqi Yang, Zhihao Fu

{"title":"Spatial feature embedding for robust visual object tracking","authors":"Kang Liu, Long Liu, Shangqi Yang, Zhihao Fu","doi":"10.1049/cvi2.12263","DOIUrl":null,"url":null,"abstract":"<p>Recently, the offline-trained Siamese pipeline has drawn wide attention due to its outstanding tracking performance. However, the existing Siamese trackers utilise offline training to extract ‘universal’ features, which is insufficient to effectively distinguish between the target and fluctuating interference in embedding the information of the two branches, leading to inaccurate classification and localisation. In addition, the Siamese trackers employ a pre-defined scale for cropping the search candidate region based on the previous frame's result, which might easily introduce redundant background noise (clutter, similar objects etc.), affecting the tracker's robustness. To solve these problems, the authors propose two novel sub-network spatial employed to spatial feature embedding for robust object tracking. Specifically, the proposed spatial remapping (SRM) network enhances the feature discrepancy between target and distractor categories by online remapping, and improves the discriminant ability of the tracker on the embedding space. The MAML is used to optimise the SRM network to ensure its adaptability to complex tracking scenarios. Moreover, a temporal information proposal-guided (TPG) network that utilises a GRU model to dynamically predict the search scale based on temporal motion states to reduce potential background interference is introduced. The proposed two network is integrated into two popular trackers, namely SiamFC++ and TransT, which achieve superior performance on six challenging benchmarks, including OTB100, VOT2019, UAV123, GOT10K, TrackingNet and LaSOT, TrackingNet and LaSOT denoting them as SiamSRMC and SiamSRMT, respectively. Moreover, the proposed trackers obtain competitive tracking performance compared with the state-of-the-art trackers in the attribute of background clutter and similar object, validating the effectiveness of our method.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 4","pages":"540-556"},"PeriodicalIF":1.5000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12263","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12263","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, the offline-trained Siamese pipeline has drawn wide attention due to its outstanding tracking performance. However, the existing Siamese trackers utilise offline training to extract ‘universal’ features, which is insufficient to effectively distinguish between the target and fluctuating interference in embedding the information of the two branches, leading to inaccurate classification and localisation. In addition, the Siamese trackers employ a pre-defined scale for cropping the search candidate region based on the previous frame's result, which might easily introduce redundant background noise (clutter, similar objects etc.), affecting the tracker's robustness. To solve these problems, the authors propose two novel sub-network spatial employed to spatial feature embedding for robust object tracking. Specifically, the proposed spatial remapping (SRM) network enhances the feature discrepancy between target and distractor categories by online remapping, and improves the discriminant ability of the tracker on the embedding space. The MAML is used to optimise the SRM network to ensure its adaptability to complex tracking scenarios. Moreover, a temporal information proposal-guided (TPG) network that utilises a GRU model to dynamically predict the search scale based on temporal motion states to reduce potential background interference is introduced. The proposed two network is integrated into two popular trackers, namely SiamFC++ and TransT, which achieve superior performance on six challenging benchmarks, including OTB100, VOT2019, UAV123, GOT10K, TrackingNet and LaSOT, TrackingNet and LaSOT denoting them as SiamSRMC and SiamSRMT, respectively. Moreover, the proposed trackers obtain competitive tracking performance compared with the state-of-the-art trackers in the attribute of background clutter and similar object, validating the effectiveness of our method.

Abstract Image

查看原文本刊更多论文

空间特征嵌入实现稳健的视觉物体跟踪

最近，离线训练的连体管道因其出色的跟踪性能而受到广泛关注。然而，现有的连体跟踪器利用离线训练来提取 "通用 "特征，这不足以有效区分目标和嵌入两个分支信息的波动干扰，从而导致分类和定位不准确。此外，连体跟踪器根据上一帧的结果，采用预定义的比例裁剪搜索候选区域，这可能容易引入冗余背景噪声（杂波、相似物体等），影响跟踪器的鲁棒性。为了解决这些问题，作者提出了两种新颖的子网络空间方法，用于空间特征嵌入，以实现鲁棒的物体跟踪。具体来说，所提出的空间重映射（SRM）网络通过在线重映射来增强目标和分心类别之间的特征差异，并提高跟踪器对嵌入空间的判别能力。MAML 用于优化 SRM 网络，以确保其适应复杂的跟踪场景。此外，还引入了时间信息提议引导（TPG）网络，该网络利用 GRU 模型根据时间运动状态动态预测搜索尺度，以减少潜在的背景干扰。提出的两个网络被集成到两个流行的跟踪器中，即 SiamFC++ 和 TransT，它们在六个具有挑战性的基准测试中取得了优异的性能，包括 OTB100、VOT2019、UAV123、GOT10K、TrackingNet 和 LaSOT，TrackingNet 和 LaSOT 分别表示为 SiamSRMC 和 SiamSRMT。此外，与最先进的跟踪器相比，所提出的跟踪器在背景杂波和相似物体的属性方面获得了有竞争力的跟踪性能，验证了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf