具有时间记忆和空间注意的视觉跟踪层次转换器

Zhixue Liang, Wenyong Dong, Bo Zhang
{"title":"具有时间记忆和空间注意的视觉跟踪层次转换器","authors":"Zhixue Liang, Wenyong Dong, Bo Zhang","doi":"10.1109/ICNSC55942.2022.10004052","DOIUrl":null,"url":null,"abstract":"Transformer-based architectures have recently witnessed significant progress in visual object tracking. However, most transformer-based trackers adopt hybrid networks, which use the convolutional neural networks (CNNs) to extract the features and the transformers to fuse and enhance them. Furthermore, most of transformer-based trackers only consider spatial dependencies between the target object and the search region, but ignore temporal relations. Simultaneously considered the temporal and spatial properties inherent in video sequences, this paper presents a hierarchical transformer with temporal memory and spatial attention network for visual tracking, named HTransT ++. The proposed network employs a hierarchical transformer as the backbone to extract multi-level features. By adopting transformer-based encoder and decoder to fuse historic template features and search region image features, the spatial and temporal dependencies across video frames are captured in tracking. Extensive experiments show that our proposed method (HTransT ++) achieves outstanding performance on four visual tracking benchmarks, including VOT2018, GOT-10K, TrackingNet, and LaSOT, while running at real-time speed.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HTransT++: Hierarchical Transformer with Temporal Memory and Spatial Attention for Visual Tracking\",\"authors\":\"Zhixue Liang, Wenyong Dong, Bo Zhang\",\"doi\":\"10.1109/ICNSC55942.2022.10004052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer-based architectures have recently witnessed significant progress in visual object tracking. However, most transformer-based trackers adopt hybrid networks, which use the convolutional neural networks (CNNs) to extract the features and the transformers to fuse and enhance them. Furthermore, most of transformer-based trackers only consider spatial dependencies between the target object and the search region, but ignore temporal relations. Simultaneously considered the temporal and spatial properties inherent in video sequences, this paper presents a hierarchical transformer with temporal memory and spatial attention network for visual tracking, named HTransT ++. The proposed network employs a hierarchical transformer as the backbone to extract multi-level features. By adopting transformer-based encoder and decoder to fuse historic template features and search region image features, the spatial and temporal dependencies across video frames are captured in tracking. Extensive experiments show that our proposed method (HTransT ++) achieves outstanding performance on four visual tracking benchmarks, including VOT2018, GOT-10K, TrackingNet, and LaSOT, while running at real-time speed.\",\"PeriodicalId\":230499,\"journal\":{\"name\":\"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC55942.2022.10004052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC55942.2022.10004052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于变压器的体系结构最近在视觉对象跟踪方面取得了重大进展。然而,大多数基于变压器的跟踪器采用混合网络,即使用卷积神经网络(cnn)提取特征并使用变压器进行融合和增强。此外,大多数基于变压器的跟踪器只考虑目标对象与搜索区域之间的空间依赖关系,而忽略了时间关系。同时考虑到视频序列固有的时间和空间特性,本文提出了一种具有时间记忆和空间注意网络的分层视觉跟踪转换器htranst++。该网络采用分层变压器作为主干来提取多层次特征。通过采用基于变换的编码器和解码器融合历史模板特征和搜索区域图像特征,在跟踪中捕获视频帧间的时空依赖关系。大量的实验表明,我们提出的方法(htranst++)在四个视觉跟踪基准上取得了出色的性能,包括VOT2018、GOT-10K、TrackingNet和LaSOT,同时以实时速度运行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HTransT++: Hierarchical Transformer with Temporal Memory and Spatial Attention for Visual Tracking
Transformer-based architectures have recently witnessed significant progress in visual object tracking. However, most transformer-based trackers adopt hybrid networks, which use the convolutional neural networks (CNNs) to extract the features and the transformers to fuse and enhance them. Furthermore, most of transformer-based trackers only consider spatial dependencies between the target object and the search region, but ignore temporal relations. Simultaneously considered the temporal and spatial properties inherent in video sequences, this paper presents a hierarchical transformer with temporal memory and spatial attention network for visual tracking, named HTransT ++. The proposed network employs a hierarchical transformer as the backbone to extract multi-level features. By adopting transformer-based encoder and decoder to fuse historic template features and search region image features, the spatial and temporal dependencies across video frames are captured in tracking. Extensive experiments show that our proposed method (HTransT ++) achieves outstanding performance on four visual tracking benchmarks, including VOT2018, GOT-10K, TrackingNet, and LaSOT, while running at real-time speed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信