{"title":"Siamese Attention and Point Adaptive Network for Visual Tracking","authors":"T. Dinh, Long Tran Quoc, Kien Thai Trung","doi":"10.1109/MAPR53640.2021.9585250","DOIUrl":null,"url":null,"abstract":"Siamese-based trackers have achieved excellent performance on visual object tracking. Most of the existing trackers usually compute the features of the target template and search image independently and rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. This paper proposes Siamese attention and point adaptive head network referred to as SiamAPN for Visual Tracking. Siamese attention includes self-attention and cross-attention for feature enhancement and aggregating rich contextual inter-dependencies between the target template and the search image. And Point head network for bounding box prediction is both proposal and anchor-free. The proposed framework is simple and effective. Extensive experiments on visual tracking benchmarks, including OTB100, UAV123, and VOT2018, demonstrate that our tracker achieves state-of-the-art performance and runs at 45 FPS.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Siamese-based trackers have achieved excellent performance on visual object tracking. Most of the existing trackers usually compute the features of the target template and search image independently and rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. This paper proposes Siamese attention and point adaptive head network referred to as SiamAPN for Visual Tracking. Siamese attention includes self-attention and cross-attention for feature enhancement and aggregating rich contextual inter-dependencies between the target template and the search image. And Point head network for bounding box prediction is both proposal and anchor-free. The proposed framework is simple and effective. Extensive experiments on visual tracking benchmarks, including OTB100, UAV123, and VOT2018, demonstrate that our tracker achieves state-of-the-art performance and runs at 45 FPS.