One-Stage Anchor-Free Online Multiple Target Tracking With Deformable Local Attention and Task-Aware Prediction

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-09-10 DOI:10.1109/TPAMI.2024.3457886

Weiming Hu;Shaoru Wang;Zongwei Zhou;Jin Gao;Yangxi Li;Stephen Maybank

{"title":"One-Stage Anchor-Free Online Multiple Target Tracking With Deformable Local Attention and Task-Aware Prediction","authors":"Weiming Hu;Shaoru Wang;Zongwei Zhou;Jin Gao;Yangxi Li;Stephen Maybank","doi":"10.1109/TPAMI.2024.3457886","DOIUrl":null,"url":null,"abstract":"The tracking-by-detection paradigm currently dominates multiple target tracking algorithms. It usually includes three tasks: target detection, appearance feature embedding, and data association. Carrying out these three tasks successively usually leads to lower tracking efficiency. In this paper, we propose a one-stage anchor-free multiple task learning framework which carries out target detection and appearance feature embedding in parallel to substantially increase the tracking speed. This framework simultaneously predicts a target detection and produces a feature embedding for each location, by sharing a pyramid of feature maps. We propose a deformable local attention module which utilizes the correlations between features at different locations within a target to obtain more discriminative features. We further propose a task-aware prediction module which utilizes deformable convolutions to select the most suitable locations for the different tasks. At the selected locations, classification of samples into foreground or background, appearance feature embedding, and target box regression are carried out. Two effective training strategies, regression range overlapping and sample reweighting, are proposed to reduce missed detections in dense scenes. Ambiguous samples whose identities are difficult to determine are effectively dealt with to obtain more accurate feature embedding of target appearance. An appearance-enhanced non-maximum suppression is proposed to reduce over-suppression of true targets in crowded scenes. Based on the one-stage anchor-free network with the deformable local attention module and the task-aware prediction module, we implement a new online multiple target tracker. Experimental results show that our tracker achieves a very fast speed while maintaining a high tracking accuracy.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11446-11463"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10674765/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The tracking-by-detection paradigm currently dominates multiple target tracking algorithms. It usually includes three tasks: target detection, appearance feature embedding, and data association. Carrying out these three tasks successively usually leads to lower tracking efficiency. In this paper, we propose a one-stage anchor-free multiple task learning framework which carries out target detection and appearance feature embedding in parallel to substantially increase the tracking speed. This framework simultaneously predicts a target detection and produces a feature embedding for each location, by sharing a pyramid of feature maps. We propose a deformable local attention module which utilizes the correlations between features at different locations within a target to obtain more discriminative features. We further propose a task-aware prediction module which utilizes deformable convolutions to select the most suitable locations for the different tasks. At the selected locations, classification of samples into foreground or background, appearance feature embedding, and target box regression are carried out. Two effective training strategies, regression range overlapping and sample reweighting, are proposed to reduce missed detections in dense scenes. Ambiguous samples whose identities are difficult to determine are effectively dealt with to obtain more accurate feature embedding of target appearance. An appearance-enhanced non-maximum suppression is proposed to reduce over-suppression of true targets in crowded scenes. Based on the one-stage anchor-free network with the deformable local attention module and the task-aware prediction module, we implement a new online multiple target tracker. Experimental results show that our tracker achieves a very fast speed while maintaining a high tracking accuracy.

查看原文本刊更多论文

利用可变形局部注意力和任务感知预测实现单级无锚在线多目标跟踪

通过检测进行跟踪的模式目前在多种目标跟踪算法中占主导地位。它通常包括三项任务：目标检测、外观特征嵌入和数据关联。连续执行这三个任务通常会降低跟踪效率。在本文中，我们提出了一种单阶段无锚多任务学习框架，它可以并行执行目标检测和外观特征嵌入，从而大幅提高跟踪速度。该框架通过共享一个金字塔形的特征图，同时预测目标检测并为每个位置生成特征嵌入。我们提出了一个可变形的局部关注模块，该模块利用目标内不同位置的特征之间的相关性来获取更具区分性的特征。我们还提出了任务感知预测模块，利用可变形卷积为不同任务选择最合适的位置。在选定的位置，将样本分类为前景或背景、外观特征嵌入和目标盒回归。我们提出了两种有效的训练策略，即回归范围重叠和样本重新加权，以减少密集场景中的漏检。对难以确定身份的模糊样本进行有效处理，以获得更准确的目标外观特征嵌入。提出了一种外观增强型非最大抑制方法，以减少拥挤场景中对真实目标的过度抑制。基于带有可变形局部注意力模块和任务感知预测模块的单级无锚网络，我们实现了一种新的在线多目标跟踪器。实验结果表明，我们的跟踪器在保持较高跟踪精度的同时，实现了极快的跟踪速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量