{"title":"Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction","authors":"Yueying Wang , Yuhao Qing , Kaer Huang , Chuangyin Dang , Zhengtian Wu","doi":"10.1016/j.fmre.2024.06.015","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Object Tracking (MOT) is designed to accurately ascertain the positions and trajectories of moving objects within a video sequence. While prevalent methodologies primarily link detected objects across successive frames by leveraging appearance and motion attributes, some approaches incorporate implicit global correlations from multiple antecedent frames to delineate target trajectories. Nonetheless, the capability to predict trajectories over multiple future frames remains insufficiently explored, leading to a significant underutilization of pertinent information in MOT. To address this gap, we introduce a transformer-based methodology, termed Preformer MOT, which enhances the precision of nonlinear trajectory predictions in dynamic settings. This enhancement is achieved through an innovative combination of a novel motion estimation technique-trajectory prediction-and Kalman filtering. Our method not only utilizes historical trajectory data but also anticipates the future positions of the target objects up to n subsequent steps, thereby furnishing a comprehensive prediction of trajectories with extensive temporal correlations. Specifically, we develop a straightforward self-supervised trajectory prediction model that estimates the future positions of a target object based on previously observed positional data. During the correlation phase, if a trajectory disruption occurs due to overlapping, occlusion, or nonlinear movements of the detected objects, Preformer MOT is capable of making early predictions using data from multiple forthcoming frames to reestablish trajectory continuity. Empirical evaluations on pedestrian datasets such as DanceTrack and MOT17 demonstrate that our approach surpasses other contemporary state-of-the-art methods. Furthermore, Preformer MOT exhibits exceptional performance in complex marine environments, underscoring its adaptability and efficacy.</div></div>","PeriodicalId":34602,"journal":{"name":"Fundamental Research","volume":"6 1","pages":"Pages 423-431"},"PeriodicalIF":6.3000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fundamental Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667325824003601","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-Object Tracking (MOT) is designed to accurately ascertain the positions and trajectories of moving objects within a video sequence. While prevalent methodologies primarily link detected objects across successive frames by leveraging appearance and motion attributes, some approaches incorporate implicit global correlations from multiple antecedent frames to delineate target trajectories. Nonetheless, the capability to predict trajectories over multiple future frames remains insufficiently explored, leading to a significant underutilization of pertinent information in MOT. To address this gap, we introduce a transformer-based methodology, termed Preformer MOT, which enhances the precision of nonlinear trajectory predictions in dynamic settings. This enhancement is achieved through an innovative combination of a novel motion estimation technique-trajectory prediction-and Kalman filtering. Our method not only utilizes historical trajectory data but also anticipates the future positions of the target objects up to n subsequent steps, thereby furnishing a comprehensive prediction of trajectories with extensive temporal correlations. Specifically, we develop a straightforward self-supervised trajectory prediction model that estimates the future positions of a target object based on previously observed positional data. During the correlation phase, if a trajectory disruption occurs due to overlapping, occlusion, or nonlinear movements of the detected objects, Preformer MOT is capable of making early predictions using data from multiple forthcoming frames to reestablish trajectory continuity. Empirical evaluations on pedestrian datasets such as DanceTrack and MOT17 demonstrate that our approach surpasses other contemporary state-of-the-art methods. Furthermore, Preformer MOT exhibits exceptional performance in complex marine environments, underscoring its adaptability and efficacy.