{"title":"MMTraP: Multi-Sensor Multi-Agent Trajectory Prediction in BEV","authors":"Sushil Sharma;Arindam Das;Ganesh Sistu;Mark Halton;Ciarán Eising","doi":"10.1109/OJVT.2025.3574385","DOIUrl":null,"url":null,"abstract":"Accurate detection and trajectory prediction of moving vehicles are essential for motion planning in autonomous driving systems. While traffic regulations provide clear boundaries, real-world scenarios remain unpredictable due to the complex interactions between vehicles. This challenge has driven significant interest in learning-based approaches for trajectory prediction. We present <bold>MMTraP:</b> <bold>M</b>ulti-Sensor and <bold>M</b>ulti-Agent <bold>Tra</b>jectory <bold>P</b>rediction in BEV. This method integrates camera, LiDAR, and radar data to create detailed Bird's-Eye-View representations of driving scenes. Our approach employs a hierarchical vector transformer architecture that first detects and classifies vehicle motion patterns before predicting future trajectories through spatiotemporal relationship modeling. This work specifically focuses on vehicle interactions and environmental constraints. Despite its significance, multi-agent trajectory prediction and moving object segmentation are still underexplored in the literature, especially in real-time applications. Our method leverages multisensor fusion to obtain precise BEV representations and predict vehicle trajectories. Our multi-sensor fusion approach achieves the highest vehicle Intersection over Union (IoU) of 63.23% and an overall mean IoU (mIoU) of 64.63%, demonstrating its effectiveness in utilizing all available sensor modalities. Additionally, we demonstrate vehicle segmentation and trajectory prediction capabilities across various lighting and weather conditions. The proposed approach has been rigorously evaluated using the nuScenes dataset. Results show that our method improves the accuracy of trajectory predictions and outperforms state-of-the-art techniques, particularly in challenging environments such as congested urban areas. For instance, in complex traffic scenarios, our approach achieves a relative improvement of 5% in trajectory prediction accuracy compared to baseline methods. This work advances vehicle-focused prediction systems by integrating multi-sensor BEV representation and interaction-aware transformers. Our approach shows promise in enhancing the reliability and accuracy of trajectory predictions for autonomous driving applications, potentially improving overall safety and efficiency in diverse driving environments.","PeriodicalId":34270,"journal":{"name":"IEEE Open Journal of Vehicular Technology","volume":"6 ","pages":"1551-1567"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11016806","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Vehicular Technology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11016806/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate detection and trajectory prediction of moving vehicles are essential for motion planning in autonomous driving systems. While traffic regulations provide clear boundaries, real-world scenarios remain unpredictable due to the complex interactions between vehicles. This challenge has driven significant interest in learning-based approaches for trajectory prediction. We present MMTraP: Multi-Sensor and Multi-Agent Trajectory Prediction in BEV. This method integrates camera, LiDAR, and radar data to create detailed Bird's-Eye-View representations of driving scenes. Our approach employs a hierarchical vector transformer architecture that first detects and classifies vehicle motion patterns before predicting future trajectories through spatiotemporal relationship modeling. This work specifically focuses on vehicle interactions and environmental constraints. Despite its significance, multi-agent trajectory prediction and moving object segmentation are still underexplored in the literature, especially in real-time applications. Our method leverages multisensor fusion to obtain precise BEV representations and predict vehicle trajectories. Our multi-sensor fusion approach achieves the highest vehicle Intersection over Union (IoU) of 63.23% and an overall mean IoU (mIoU) of 64.63%, demonstrating its effectiveness in utilizing all available sensor modalities. Additionally, we demonstrate vehicle segmentation and trajectory prediction capabilities across various lighting and weather conditions. The proposed approach has been rigorously evaluated using the nuScenes dataset. Results show that our method improves the accuracy of trajectory predictions and outperforms state-of-the-art techniques, particularly in challenging environments such as congested urban areas. For instance, in complex traffic scenarios, our approach achieves a relative improvement of 5% in trajectory prediction accuracy compared to baseline methods. This work advances vehicle-focused prediction systems by integrating multi-sensor BEV representation and interaction-aware transformers. Our approach shows promise in enhancing the reliability and accuracy of trajectory predictions for autonomous driving applications, potentially improving overall safety and efficiency in diverse driving environments.