{"title":"TransTracking for UAV: An Autonomous Real-time Target Tracking System for UAV via Transformer Tracking","authors":"Xiaolou Sun, Fei Xie, Qi Wang, Yuncon Yao, Hao Wang, Wei Wang, Wankou Yang","doi":"10.1109/ICITES53477.2021.9637088","DOIUrl":null,"url":null,"abstract":"Most of the existing Siamese-type trackers usually adopt pre-defined anchor boxes or anchor-free schemes to accurately estimate the bounding box of targets. Unfortunately, they suffer from complicated hand-designed components and tedious post-processings. It is not easy to adjust parameters for unique scenes in real applications. So, we propose a new scheme by formulating visual tracking as a direct set prediction problem to alleviate this issue. The main component is a transformer attached to the Siamese-type feature extraction networks. Thus, our new framework can be summarized as Siamese Network with Transformers (SiamTFR). With a fixed small set of learned object queries, we force the final set of predictions via bipartite matching, significantly reducing hyper-parameters associated with the candidate boxes. Due to the unique predictions of this framework, we significantly ease the heavy burden of hyper-parameters search of post-processings in visual tracking. Extensive experiments on visual tracking benchmarks, including GOT-10K, demonstrate that SiamTFR achieves competitive performance and runs at 50 FPS. Specifically, SiamTFR outperforms leading anchor-based tracker SiamRPN++ in the GOT-10K benchmark, confirming its effectiveness and efficiency. Furthermore, SiamTFR is deployed on the embedded device in which the algorithm can be run at 30FPS or 54FPS with TensorRT meeting the real-time requirements. In addition, we design the complete tracking system demo that can work in the real road to narrow the gap between the academic models and industrial deployments.","PeriodicalId":370828,"journal":{"name":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITES53477.2021.9637088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Most of the existing Siamese-type trackers usually adopt pre-defined anchor boxes or anchor-free schemes to accurately estimate the bounding box of targets. Unfortunately, they suffer from complicated hand-designed components and tedious post-processings. It is not easy to adjust parameters for unique scenes in real applications. So, we propose a new scheme by formulating visual tracking as a direct set prediction problem to alleviate this issue. The main component is a transformer attached to the Siamese-type feature extraction networks. Thus, our new framework can be summarized as Siamese Network with Transformers (SiamTFR). With a fixed small set of learned object queries, we force the final set of predictions via bipartite matching, significantly reducing hyper-parameters associated with the candidate boxes. Due to the unique predictions of this framework, we significantly ease the heavy burden of hyper-parameters search of post-processings in visual tracking. Extensive experiments on visual tracking benchmarks, including GOT-10K, demonstrate that SiamTFR achieves competitive performance and runs at 50 FPS. Specifically, SiamTFR outperforms leading anchor-based tracker SiamRPN++ in the GOT-10K benchmark, confirming its effectiveness and efficiency. Furthermore, SiamTFR is deployed on the embedded device in which the algorithm can be run at 30FPS or 54FPS with TensorRT meeting the real-time requirements. In addition, we design the complete tracking system demo that can work in the real road to narrow the gap between the academic models and industrial deployments.