基于ConvNeXt和Transformer的集成长短期跟踪

2022 7th International Conference on Image, Vision and Computing (ICIVC) Pub Date : 2022-07-26 DOI:10.1109/ICIVC55077.2022.9887117

Yuhua Xiao, Yifeng Zhang, Pengyu Ni

{"title":"基于ConvNeXt和Transformer的集成长短期跟踪","authors":"Yuhua Xiao, Yifeng Zhang, Pengyu Ni","doi":"10.1109/ICIVC55077.2022.9887117","DOIUrl":null,"url":null,"abstract":"Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.","PeriodicalId":227073,"journal":{"name":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Ensemble Long Short-Term Tracking with ConvNeXt and Transformer\",\"authors\":\"Yuhua Xiao, Yifeng Zhang, Pengyu Ni\",\"doi\":\"10.1109/ICIVC55077.2022.9887117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.\",\"PeriodicalId\":227073,\"journal\":{\"name\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Image, Vision and Computing (ICIVC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIVC55077.2022.9887117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Image, Vision and Computing (ICIVC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIVC55077.2022.9887117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

视觉目标跟踪是计算机视觉领域的一个重要研究课题。广泛使用的Siamese网络架构学习目标对象和搜索区域之间的相似度度量，并在视频序列中定位目标。本文提出了一种基于ConvNeXt和Transformer的集成长短期跟踪算法。首先，采用基于卷积神经网络的Siamese网络提取目标区域和搜索区域的特征;其次，引入了一个编码器-解码器转换器来捕获全局特征依赖关系。此外，设计了一种基于ou置信度的跟踪集成算法，以捕获目标的长期稳定外观和短期可变外观。该跟踪器被称为STARK- next，在LaSOT上的成功率为68.9%，比STARK高出1.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ensemble Long Short-Term Tracking with ConvNeXt and Transformer

Visual object tracking is an important research topic in Computer Vision. The widely used Siamese network architecture learns a similarity metric between target objects and search regions, and locates the targets in video sequences. In this paper, we present an ensemble long short-term tracking algorithm based on ConvNeXt and Transformer. Firstly, a Siamese network with the ConvNeXt backbone is applied to extract features for both target and search regions. Secondly, an encoder-decoder transformer is introduced to capture global feature dependencies. In addition, an IoU-confidence-based tracking ensemble algorithm is designed to capture both long-term stable appearances and short-term variable appearances of the target. The proposed tracker, called STARK-NeXt, achieves a success rate of 68.9% on LaSOT, outperforming STARK by 1.8%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 7th International Conference on Image, Vision and Computing (ICIVC)

自引率

0.00%

发文量