基于ConvNeXt和Transformer的集成长短期跟踪

Yuhua Xiao, Yifeng Zhang, Pengyu Ni
{"title":"基于ConvNeXt和Transformer的集成长短期跟踪","authors":"Yuhua Xiao, Yifeng Zhang, Pengyu Ni","doi":"10.1109/ICIVC55077.2022.9887117","DOIUrl":null,"url":null,"abstract":"Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Ensemble Long Short-Term Tracking with ConvNeXt and Transformer\",\"authors\":\"Yuhua Xiao, Yifeng Zhang, Pengyu Ni\",\"doi\":\"10.1109/ICIVC55077.2022.9887117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.\",\"PeriodicalId\":227073,\"journal\":{\"name\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIVC55077.2022.9887117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIVC55077.2022.9887117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

视觉目标跟踪是计算机视觉领域的一个重要研究课题。广泛使用的Siamese网络架构学习目标对象和搜索区域之间的相似度度量,并在视频序列中定位目标。本文提出了一种基于ConvNeXt和Transformer的集成长短期跟踪算法。首先,采用基于卷积神经网络的Siamese网络提取目标区域和搜索区域的特征;其次,引入了一个编码器-解码器转换器来捕获全局特征依赖关系。此外,设计了一种基于ou置信度的跟踪集成算法,以捕获目标的长期稳定外观和短期可变外观。该跟踪器被称为STARK- next,在LaSOT上的成功率为68.9%,比STARK高出1.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ensemble Long Short-Term Tracking with ConvNeXt and Transformer
Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信