{"title":"Synergistic-aware cascaded association and trajectory refinement for multi-object tracking","authors":"Hui Li, Su Qin, Saiyu Li, Ying Gao, Yanli Wu","doi":"10.1016/j.imavis.2025.105695","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-object tracking (MOT) is a pivotal research area in computer vision. Effectively tracking objects in scenarios with frequent occlusions and crowded scenes has become a key challenge in MOT tasks. Existing tracking-by-detection (TbD) methods often rely on simple two-frame association techniques. However, in situations involving scale transformation or requiring long-term association, frequent occlusion between objects can lead to ID switches, especially in scenes with dense or highly intersecting objects. Therefore, we propose a synergistic-aware cascaded association and trajectory refinement method (SCTrack) for multi-object tracking. In the data association stage, we propose a synergistic-aware cascaded association method to construct a multi-perception affinity matrix for object association, and introduce the multi-frame collaborative distance calculation to enhance the robustness. To address the problem of trajectory fragmentation, we propose a dynamic confidence-driven trajectory refinement post-processing method. This method integrates confidence and feature information to calculate trajectory association, repair fragmented trajectories, and improve the overall robustness of the tracking algorithm. Extensive experiments on the MOT17, MOT20, and DanceTrack datasets validate SCTrack’s competitive performance.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105695"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002835","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-object tracking (MOT) is a pivotal research area in computer vision. Effectively tracking objects in scenarios with frequent occlusions and crowded scenes has become a key challenge in MOT tasks. Existing tracking-by-detection (TbD) methods often rely on simple two-frame association techniques. However, in situations involving scale transformation or requiring long-term association, frequent occlusion between objects can lead to ID switches, especially in scenes with dense or highly intersecting objects. Therefore, we propose a synergistic-aware cascaded association and trajectory refinement method (SCTrack) for multi-object tracking. In the data association stage, we propose a synergistic-aware cascaded association method to construct a multi-perception affinity matrix for object association, and introduce the multi-frame collaborative distance calculation to enhance the robustness. To address the problem of trajectory fragmentation, we propose a dynamic confidence-driven trajectory refinement post-processing method. This method integrates confidence and feature information to calculate trajectory association, repair fragmented trajectories, and improve the overall robustness of the tracking algorithm. Extensive experiments on the MOT17, MOT20, and DanceTrack datasets validate SCTrack’s competitive performance.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.