{"title":"Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking","authors":"Takanori Asanomi, Kazuya Nishimura, Ryoma Bise","doi":"10.1109/WACV56688.2023.00171","DOIUrl":null,"url":null,"abstract":"Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based tracking techniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frames, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset. The code is publicly available in https://github.com/asanomitakanori/mfa-feature-warping.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based tracking techniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frames, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset. The code is publicly available in https://github.com/asanomitakanori/mfa-feature-warping.