TransTracking for UAV: An Autonomous Real-time Target Tracking System for UAV via Transformer Tracking

2021 International Conference on Intelligent Technology and Embedded Systems (ICITES) Pub Date : 2021-10-31 DOI:10.1109/ICITES53477.2021.9637088

Xiaolou Sun, Fei Xie, Qi Wang, Yuncon Yao, Hao Wang, Wei Wang, Wankou Yang

{"title":"TransTracking for UAV: An Autonomous Real-time Target Tracking System for UAV via Transformer Tracking","authors":"Xiaolou Sun, Fei Xie, Qi Wang, Yuncon Yao, Hao Wang, Wei Wang, Wankou Yang","doi":"10.1109/ICITES53477.2021.9637088","DOIUrl":null,"url":null,"abstract":"Most of the existing Siamese-type trackers usually adopt pre-defined anchor boxes or anchor-free schemes to accurately estimate the bounding box of targets. Unfortunately, they suffer from complicated hand-designed components and tedious post-processings. It is not easy to adjust parameters for unique scenes in real applications. So, we propose a new scheme by formulating visual tracking as a direct set prediction problem to alleviate this issue. The main component is a transformer attached to the Siamese-type feature extraction networks. Thus, our new framework can be summarized as Siamese Network with Transformers (SiamTFR). With a fixed small set of learned object queries, we force the final set of predictions via bipartite matching, significantly reducing hyper-parameters associated with the candidate boxes. Due to the unique predictions of this framework, we significantly ease the heavy burden of hyper-parameters search of post-processings in visual tracking. Extensive experiments on visual tracking benchmarks, including GOT-10K, demonstrate that SiamTFR achieves competitive performance and runs at 50 FPS. Specifically, SiamTFR outperforms leading anchor-based tracker SiamRPN++ in the GOT-10K benchmark, confirming its effectiveness and efficiency. Furthermore, SiamTFR is deployed on the embedded device in which the algorithm can be run at 30FPS or 54FPS with TensorRT meeting the real-time requirements. In addition, we design the complete tracking system demo that can work in the real road to narrow the gap between the academic models and industrial deployments.","PeriodicalId":370828,"journal":{"name":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITES53477.2021.9637088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Most of the existing Siamese-type trackers usually adopt pre-defined anchor boxes or anchor-free schemes to accurately estimate the bounding box of targets. Unfortunately, they suffer from complicated hand-designed components and tedious post-processings. It is not easy to adjust parameters for unique scenes in real applications. So, we propose a new scheme by formulating visual tracking as a direct set prediction problem to alleviate this issue. The main component is a transformer attached to the Siamese-type feature extraction networks. Thus, our new framework can be summarized as Siamese Network with Transformers (SiamTFR). With a fixed small set of learned object queries, we force the final set of predictions via bipartite matching, significantly reducing hyper-parameters associated with the candidate boxes. Due to the unique predictions of this framework, we significantly ease the heavy burden of hyper-parameters search of post-processings in visual tracking. Extensive experiments on visual tracking benchmarks, including GOT-10K, demonstrate that SiamTFR achieves competitive performance and runs at 50 FPS. Specifically, SiamTFR outperforms leading anchor-based tracker SiamRPN++ in the GOT-10K benchmark, confirming its effectiveness and efficiency. Furthermore, SiamTFR is deployed on the embedded device in which the algorithm can be run at 30FPS or 54FPS with TensorRT meeting the real-time requirements. In addition, we design the complete tracking system demo that can work in the real road to narrow the gap between the academic models and industrial deployments.

查看原文本刊更多论文

一种基于变压器跟踪的无人机自主实时目标跟踪系统

现有的暹罗式跟踪器大多采用预定义锚盒或无锚盒方案来准确估计目标的包围盒。不幸的是，它们受到复杂的手工设计组件和繁琐的后处理的困扰。在实际应用中，为独特的场景调整参数是不容易的。因此，我们提出了一种新的方案，将视觉跟踪作为一个直接集预测问题来解决这一问题。主要部件是连接到暹罗式特征提取网络的变压器。因此，我们的新框架可以概括为带有变压器的暹罗网络(SiamTFR)。使用固定的小学习对象查询集，我们通过二部匹配强制最终预测集，显着减少与候选框相关的超参数。由于该框架的独特预测，我们大大减轻了视觉跟踪中后处理超参数搜索的沉重负担。在包括GOT-10K在内的视觉跟踪基准上进行的大量实验表明，SiamTFR达到了具有竞争力的性能，并以50 FPS的速度运行。具体来说，在GOT-10K基准测试中，SiamTFR优于领先的基于锚点的跟踪器siamrpn++，证实了其有效性和效率。此外，SiamTFR部署在嵌入式设备上，算法可以在30FPS或54FPS下运行，TensorRT满足实时性要求。此外，我们还设计了完整的可在真实道路上工作的跟踪系统演示，以缩小学术模型与工业部署之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)

自引率

0.00%

发文量