Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction

IF 6.3 3区综合性期刊 Q1 Multidisciplinary

Fundamental Research Pub Date : 2026-01-01 Epub Date: 2025-01-30 DOI:10.1016/j.fmre.2024.06.015

Yueying Wang , Yuhao Qing , Kaer Huang , Chuangyin Dang , Zhengtian Wu

{"title":"Preformer MOT: A transformer-based approach for multi-object tracking with global trajectory prediction","authors":"Yueying Wang , Yuhao Qing , Kaer Huang , Chuangyin Dang , Zhengtian Wu","doi":"10.1016/j.fmre.2024.06.015","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Object Tracking (MOT) is designed to accurately ascertain the positions and trajectories of moving objects within a video sequence. While prevalent methodologies primarily link detected objects across successive frames by leveraging appearance and motion attributes, some approaches incorporate implicit global correlations from multiple antecedent frames to delineate target trajectories. Nonetheless, the capability to predict trajectories over multiple future frames remains insufficiently explored, leading to a significant underutilization of pertinent information in MOT. To address this gap, we introduce a transformer-based methodology, termed Preformer MOT, which enhances the precision of nonlinear trajectory predictions in dynamic settings. This enhancement is achieved through an innovative combination of a novel motion estimation technique-trajectory prediction-and Kalman filtering. Our method not only utilizes historical trajectory data but also anticipates the future positions of the target objects up to n subsequent steps, thereby furnishing a comprehensive prediction of trajectories with extensive temporal correlations. Specifically, we develop a straightforward self-supervised trajectory prediction model that estimates the future positions of a target object based on previously observed positional data. During the correlation phase, if a trajectory disruption occurs due to overlapping, occlusion, or nonlinear movements of the detected objects, Preformer MOT is capable of making early predictions using data from multiple forthcoming frames to reestablish trajectory continuity. Empirical evaluations on pedestrian datasets such as DanceTrack and MOT17 demonstrate that our approach surpasses other contemporary state-of-the-art methods. Furthermore, Preformer MOT exhibits exceptional performance in complex marine environments, underscoring its adaptability and efficacy.</div></div>","PeriodicalId":34602,"journal":{"name":"Fundamental Research","volume":"6 1","pages":"Pages 423-431"},"PeriodicalIF":6.3000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fundamental Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667325824003601","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-Object Tracking (MOT) is designed to accurately ascertain the positions and trajectories of moving objects within a video sequence. While prevalent methodologies primarily link detected objects across successive frames by leveraging appearance and motion attributes, some approaches incorporate implicit global correlations from multiple antecedent frames to delineate target trajectories. Nonetheless, the capability to predict trajectories over multiple future frames remains insufficiently explored, leading to a significant underutilization of pertinent information in MOT. To address this gap, we introduce a transformer-based methodology, termed Preformer MOT, which enhances the precision of nonlinear trajectory predictions in dynamic settings. This enhancement is achieved through an innovative combination of a novel motion estimation technique-trajectory prediction-and Kalman filtering. Our method not only utilizes historical trajectory data but also anticipates the future positions of the target objects up to n subsequent steps, thereby furnishing a comprehensive prediction of trajectories with extensive temporal correlations. Specifically, we develop a straightforward self-supervised trajectory prediction model that estimates the future positions of a target object based on previously observed positional data. During the correlation phase, if a trajectory disruption occurs due to overlapping, occlusion, or nonlinear movements of the detected objects, Preformer MOT is capable of making early predictions using data from multiple forthcoming frames to reestablish trajectory continuity. Empirical evaluations on pedestrian datasets such as DanceTrack and MOT17 demonstrate that our approach surpasses other contemporary state-of-the-art methods. Furthermore, Preformer MOT exhibits exceptional performance in complex marine environments, underscoring its adaptability and efficacy.

Abstract Image

查看原文本刊更多论文

Preformer MOT：一种基于变压器的全局轨迹预测多目标跟踪方法

多目标跟踪（MOT）的目的是准确地确定视频序列中运动物体的位置和轨迹。虽然流行的方法主要是通过利用外观和运动属性将检测到的对象跨连续帧连接起来，但一些方法结合了来自多个前帧的隐式全局相关性来描绘目标轨迹。尽管如此，预测多个未来框架的轨迹的能力仍然没有得到充分的探索，导致MOT中相关信息的利用严重不足。为了解决这一差距，我们引入了一种基于变压器的方法，称为Preformer MOT，它提高了动态设置下非线性轨迹预测的精度。这种增强是通过一种新的运动估计技术-轨迹预测-和卡尔曼滤波的创新组合来实现的。我们的方法不仅利用了历史轨迹数据，而且还预测了目标物体在后续n步中的未来位置，从而提供了具有广泛时间相关性的轨迹的全面预测。具体来说，我们开发了一个直接的自监督轨迹预测模型，该模型基于先前观察到的位置数据估计目标物体的未来位置。在相关阶段，如果由于检测到的物体的重叠、遮挡或非线性运动而导致轨迹中断，Preformer MOT能够使用来自多个即将到来的帧的数据进行早期预测，以重建轨迹连续性。对DanceTrack和MOT17等行人数据集的实证评估表明，我们的方法优于其他当代最先进的方法。此外，Preformer MOT在复杂的海洋环境中表现出优异的性能，突出了其适应性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊