{"title":"基于稀疏检测变压器的多目标跟踪新算法","authors":"Jun Miao , Maoxuan Zhang , Yuanhua Qiao","doi":"10.1016/j.engappai.2025.112666","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-object tracking (MOT) is crucial for intelligent surveillance and autonomous driving. However, existing Transformer-based methods often suffer from an accuracy-efficiency trade-off due to high computational complexity, limiting real-time applicability. To address this, we propose SparseDeTrack (Sparse Detection Tracking), an efficient MOT framework based on the tracking-by-detection (TBD) paradigm. In detection, we employ a sparse token Transformer with a 30 % token retention rate, effectively reducing computational cost while retaining essential features. In tracking, we remove the Re-Identification (ReID) module and enhance the Extended Kalman Filter (EKF) by directly predicting the width and height instead of the aspect ratio of bounding boxes, improving both localization accuracy and nonlinear motion modeling. Furthermore, ByteTrack (Multi-Object Tracking by Associating Every Detection Box) is integrated for secondary association, increasing robustness under occlusion. We conduct extensive experiments on MOTChallenge 17 (MOT17), MOTChallenge 20 (MOT20), and DanceTrack benchmarks. On the MOT17 test set, SparseDeTrack achieves a Multiple Object Tracking Accuracy (MOTA) of 75.4, outperforming Transformer-based methods such as MOTR (End-to-End Multiple-Object Tracking with Transformer), Trackformer (Multi-Object Tracking with Transformers), and TransTrack (Multiple Object Tracking with Transformer) by 2.0, 1.3, and 0.2 points, respectively, while attaining a high inference speed of 44.5 frames per second (FPS), balancing accuracy and efficiency. It reaches 65.6 MOTA on crowded MOT20 and 89.1 MOTA on nonlinear-motion DanceTrack, comparable to state-of-the-art methods. These results confirm that SparseDeTrack delivers both high-precision tracking and real-time inference in complex scenarios, making it a promising solution for real-world applications in intelligent surveillance and autonomous driving.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112666"},"PeriodicalIF":8.0000,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new multi-object tracking algorithm based on Sparse Detection Transformer\",\"authors\":\"Jun Miao , Maoxuan Zhang , Yuanhua Qiao\",\"doi\":\"10.1016/j.engappai.2025.112666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-object tracking (MOT) is crucial for intelligent surveillance and autonomous driving. However, existing Transformer-based methods often suffer from an accuracy-efficiency trade-off due to high computational complexity, limiting real-time applicability. To address this, we propose SparseDeTrack (Sparse Detection Tracking), an efficient MOT framework based on the tracking-by-detection (TBD) paradigm. In detection, we employ a sparse token Transformer with a 30 % token retention rate, effectively reducing computational cost while retaining essential features. In tracking, we remove the Re-Identification (ReID) module and enhance the Extended Kalman Filter (EKF) by directly predicting the width and height instead of the aspect ratio of bounding boxes, improving both localization accuracy and nonlinear motion modeling. Furthermore, ByteTrack (Multi-Object Tracking by Associating Every Detection Box) is integrated for secondary association, increasing robustness under occlusion. We conduct extensive experiments on MOTChallenge 17 (MOT17), MOTChallenge 20 (MOT20), and DanceTrack benchmarks. On the MOT17 test set, SparseDeTrack achieves a Multiple Object Tracking Accuracy (MOTA) of 75.4, outperforming Transformer-based methods such as MOTR (End-to-End Multiple-Object Tracking with Transformer), Trackformer (Multi-Object Tracking with Transformers), and TransTrack (Multiple Object Tracking with Transformer) by 2.0, 1.3, and 0.2 points, respectively, while attaining a high inference speed of 44.5 frames per second (FPS), balancing accuracy and efficiency. It reaches 65.6 MOTA on crowded MOT20 and 89.1 MOTA on nonlinear-motion DanceTrack, comparable to state-of-the-art methods. These results confirm that SparseDeTrack delivers both high-precision tracking and real-time inference in complex scenarios, making it a promising solution for real-world applications in intelligent surveillance and autonomous driving.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"163 \",\"pages\":\"Article 112666\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625026971\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625026971","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
多目标跟踪(MOT)对于智能监控和自动驾驶至关重要。然而,由于计算复杂度高,现有的基于变压器的方法往往存在精度和效率之间的权衡,限制了实时适用性。为了解决这个问题,我们提出了SparseDeTrack(稀疏检测跟踪),这是一个基于检测跟踪(TBD)范式的高效MOT框架。在检测中,我们使用了一个具有30%令牌保留率的稀疏令牌转换器,在保留基本特征的同时有效地降低了计算成本。在跟踪中,我们去掉了再识别(ReID)模块,并通过直接预测边界框的宽度和高度来增强扩展卡尔曼滤波(EKF),从而提高了定位精度和非线性运动建模。此外,结合ByteTrack (Multi-Object Tracking by associated Every Detection Box)进行二次关联,增强遮挡下的鲁棒性。我们在MOTChallenge 17 (MOT17)、MOTChallenge 20 (MOT20)和DanceTrack基准上进行了广泛的实验。在MOT17测试集上,SparseDeTrack实现了75.4的多目标跟踪精度(MOTA),分别比基于变压器的MOTR (End-to-End - Multiple-Object Tracking with Transformer)、Trackformer (Multi-Object Tracking with Transformers)和TransTrack (Multi-Object Tracking with Transformer)方法高出2.0、1.3和0.2个点,同时达到了44.5帧/秒的高推理速度,平衡了精度和效率。在拥挤的MOT20上达到65.6 MOTA,在非线性运动的DanceTrack上达到89.1 MOTA,与最先进的方法相当。这些结果证实,SparseDeTrack在复杂场景中提供高精度跟踪和实时推理,使其成为智能监控和自动驾驶等实际应用的有前途的解决方案。
A new multi-object tracking algorithm based on Sparse Detection Transformer
Multi-object tracking (MOT) is crucial for intelligent surveillance and autonomous driving. However, existing Transformer-based methods often suffer from an accuracy-efficiency trade-off due to high computational complexity, limiting real-time applicability. To address this, we propose SparseDeTrack (Sparse Detection Tracking), an efficient MOT framework based on the tracking-by-detection (TBD) paradigm. In detection, we employ a sparse token Transformer with a 30 % token retention rate, effectively reducing computational cost while retaining essential features. In tracking, we remove the Re-Identification (ReID) module and enhance the Extended Kalman Filter (EKF) by directly predicting the width and height instead of the aspect ratio of bounding boxes, improving both localization accuracy and nonlinear motion modeling. Furthermore, ByteTrack (Multi-Object Tracking by Associating Every Detection Box) is integrated for secondary association, increasing robustness under occlusion. We conduct extensive experiments on MOTChallenge 17 (MOT17), MOTChallenge 20 (MOT20), and DanceTrack benchmarks. On the MOT17 test set, SparseDeTrack achieves a Multiple Object Tracking Accuracy (MOTA) of 75.4, outperforming Transformer-based methods such as MOTR (End-to-End Multiple-Object Tracking with Transformer), Trackformer (Multi-Object Tracking with Transformers), and TransTrack (Multiple Object Tracking with Transformer) by 2.0, 1.3, and 0.2 points, respectively, while attaining a high inference speed of 44.5 frames per second (FPS), balancing accuracy and efficiency. It reaches 65.6 MOTA on crowded MOT20 and 89.1 MOTA on nonlinear-motion DanceTrack, comparable to state-of-the-art methods. These results confirm that SparseDeTrack delivers both high-precision tracking and real-time inference in complex scenarios, making it a promising solution for real-world applications in intelligent surveillance and autonomous driving.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.