{"title":"基于多帧交叉注意机制的变压器判别相关滤波跟踪算法","authors":"Jie Yuan, Shuo Chen, Zhaoyi Shi, Shaona Yu","doi":"10.1109/ICCSMT54525.2021.00071","DOIUrl":null,"url":null,"abstract":"Currently, tracking methods based on discriminative correlation filter and Siamese network are one of the hot research topics in visual object tracking tasks. Among them, how to make full use of the rich spatio-temporal information of the target between frames in a video sequence is one of the core problems in studying this topic. To address this problem, the information related to the target in the first frame, the history frame, and the current frame is transformed throughout the tracking process with the Cross-Attention mechanism as the core mechanism, and the Siamese-like architecture is used to achieve a more complete characterization of the tracking target features. We propose a discriminative correlation filter tracking algorithm with Transformer based on a multi-frame Cross-attention mechanism to improve tracking accuracy while maintaining the tracking speed essentially constant. We tested our proposed model on GOT-10k, TrackingNet and OTB2015 datasets, and the test results demonstrate the effectiveness of our proposed model, improving tracking accuracy while running at real-time speed.","PeriodicalId":304337,"journal":{"name":"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)","volume":"329 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discriminative correlation filter tracking algorithm with Transformer based on a multi-frame Cross-Attention mechanism\",\"authors\":\"Jie Yuan, Shuo Chen, Zhaoyi Shi, Shaona Yu\",\"doi\":\"10.1109/ICCSMT54525.2021.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, tracking methods based on discriminative correlation filter and Siamese network are one of the hot research topics in visual object tracking tasks. Among them, how to make full use of the rich spatio-temporal information of the target between frames in a video sequence is one of the core problems in studying this topic. To address this problem, the information related to the target in the first frame, the history frame, and the current frame is transformed throughout the tracking process with the Cross-Attention mechanism as the core mechanism, and the Siamese-like architecture is used to achieve a more complete characterization of the tracking target features. We propose a discriminative correlation filter tracking algorithm with Transformer based on a multi-frame Cross-attention mechanism to improve tracking accuracy while maintaining the tracking speed essentially constant. We tested our proposed model on GOT-10k, TrackingNet and OTB2015 datasets, and the test results demonstrate the effectiveness of our proposed model, improving tracking accuracy while running at real-time speed.\",\"PeriodicalId\":304337,\"journal\":{\"name\":\"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)\",\"volume\":\"329 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSMT54525.2021.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSMT54525.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discriminative correlation filter tracking algorithm with Transformer based on a multi-frame Cross-Attention mechanism
Currently, tracking methods based on discriminative correlation filter and Siamese network are one of the hot research topics in visual object tracking tasks. Among them, how to make full use of the rich spatio-temporal information of the target between frames in a video sequence is one of the core problems in studying this topic. To address this problem, the information related to the target in the first frame, the history frame, and the current frame is transformed throughout the tracking process with the Cross-Attention mechanism as the core mechanism, and the Siamese-like architecture is used to achieve a more complete characterization of the tracking target features. We propose a discriminative correlation filter tracking algorithm with Transformer based on a multi-frame Cross-attention mechanism to improve tracking accuracy while maintaining the tracking speed essentially constant. We tested our proposed model on GOT-10k, TrackingNet and OTB2015 datasets, and the test results demonstrate the effectiveness of our proposed model, improving tracking accuracy while running at real-time speed.