Wen Zhang;Zhexuan Sun;Shengrong Lv;Konghao Mei;Guangkun Chen;Zhenya Yang
{"title":"MFV3DL:未来车辆三维定位的单目视觉方法","authors":"Wen Zhang;Zhexuan Sun;Shengrong Lv;Konghao Mei;Guangkun Chen;Zhenya Yang","doi":"10.1109/TITS.2025.3572742","DOIUrl":null,"url":null,"abstract":"Vision-based future vehicle localization provides intuitive trajectory prediction, serving as a critical foundation for Advanced Driving Assistance Systems (ADAS) to formulate collision avoidance decisions. Among existing approaches, ego-view trajectory prediction has proven effective for driver monitoring and intervention in vision-based localization. This method aligns closely with human perceptual processing, making it essential for the Driver-in-the-Loop (DIL) development stage in modern ADAS. However, most existing ego-view trajectory prediction approaches rely on two-dimensional image-based predictions, creating a gap with human three-dimensional perception. This disparity negatively impacts the accuracy and timeliness of driver decision-making and intervention. In this paper, we propose MFV3DL (Monocular Vision Method for Future Vehicle 3D Localization), a dual-stream framework integrating 2D image trajectory prediction and depth prediction to achieve future vehicle 3D localization. To enhance accuracy, we leverage Multi-Object Tracking and Segmentation (MOTS) results and depth estimation as inputs for the dual-stream architecture. Additionally, we introduce a Related Information Fusion (RIF) unit to enable cross-modal interaction between the two streams. For depth stream predictions, we propose a ConvLSTM-based depth prediction method. Experimental results on the KITTI dataset demonstrate that MFV3DL outperforms state-of-the-art methods. In diverse driving scenarios, MFV3DL achieves superior 3D visualization results compared to 2D trajectory-based predictions. Baseline comparisons and ablation studies further validate that the proposed ConvLSTM-based depth prediction enhances the dual-stream architecture and RIF unit for 3D localization tasks.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 7","pages":"9277-9292"},"PeriodicalIF":7.9000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MFV3DL: Monocular Vision Method for Future Vehicle 3D Localization\",\"authors\":\"Wen Zhang;Zhexuan Sun;Shengrong Lv;Konghao Mei;Guangkun Chen;Zhenya Yang\",\"doi\":\"10.1109/TITS.2025.3572742\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision-based future vehicle localization provides intuitive trajectory prediction, serving as a critical foundation for Advanced Driving Assistance Systems (ADAS) to formulate collision avoidance decisions. Among existing approaches, ego-view trajectory prediction has proven effective for driver monitoring and intervention in vision-based localization. This method aligns closely with human perceptual processing, making it essential for the Driver-in-the-Loop (DIL) development stage in modern ADAS. However, most existing ego-view trajectory prediction approaches rely on two-dimensional image-based predictions, creating a gap with human three-dimensional perception. This disparity negatively impacts the accuracy and timeliness of driver decision-making and intervention. In this paper, we propose MFV3DL (Monocular Vision Method for Future Vehicle 3D Localization), a dual-stream framework integrating 2D image trajectory prediction and depth prediction to achieve future vehicle 3D localization. To enhance accuracy, we leverage Multi-Object Tracking and Segmentation (MOTS) results and depth estimation as inputs for the dual-stream architecture. Additionally, we introduce a Related Information Fusion (RIF) unit to enable cross-modal interaction between the two streams. For depth stream predictions, we propose a ConvLSTM-based depth prediction method. Experimental results on the KITTI dataset demonstrate that MFV3DL outperforms state-of-the-art methods. In diverse driving scenarios, MFV3DL achieves superior 3D visualization results compared to 2D trajectory-based predictions. Baseline comparisons and ablation studies further validate that the proposed ConvLSTM-based depth prediction enhances the dual-stream architecture and RIF unit for 3D localization tasks.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 7\",\"pages\":\"9277-9292\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11021523/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11021523/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
摘要
基于视觉的未来车辆定位提供了直观的轨迹预测,是高级驾驶辅助系统(ADAS)制定避碰决策的关键基础。在现有的方法中,自视轨迹预测在基于视觉定位的驾驶员监控和干预中被证明是有效的。该方法与人类感知处理密切相关,使其成为现代ADAS中驾驶员在环(DIL)发展阶段必不可少的方法。然而,大多数现有的自我视图轨迹预测方法依赖于基于二维图像的预测,与人类的三维感知存在差距。这种差异对驾驶员决策和干预的准确性和及时性产生了负面影响。在本文中,我们提出了MFV3DL (Monocular Vision Method for Future Vehicle 3D Localization),这是一种融合2D图像轨迹预测和深度预测的双流框架,用于实现未来车辆3D定位。为了提高精度,我们利用多目标跟踪和分割(MOTS)结果和深度估计作为双流架构的输入。此外,我们引入了相关信息融合(RIF)单元来实现两个流之间的跨模态交互。对于深度流预测,我们提出了一种基于convlstm的深度预测方法。在KITTI数据集上的实验结果表明,MFV3DL优于最先进的方法。与基于2D轨迹的预测相比,在不同的驾驶场景中,MFV3DL实现了更好的3D可视化结果。基线比较和消融研究进一步验证了所提出的基于convlstm的深度预测增强了双流架构和RIF单元的3D定位任务。
MFV3DL: Monocular Vision Method for Future Vehicle 3D Localization
Vision-based future vehicle localization provides intuitive trajectory prediction, serving as a critical foundation for Advanced Driving Assistance Systems (ADAS) to formulate collision avoidance decisions. Among existing approaches, ego-view trajectory prediction has proven effective for driver monitoring and intervention in vision-based localization. This method aligns closely with human perceptual processing, making it essential for the Driver-in-the-Loop (DIL) development stage in modern ADAS. However, most existing ego-view trajectory prediction approaches rely on two-dimensional image-based predictions, creating a gap with human three-dimensional perception. This disparity negatively impacts the accuracy and timeliness of driver decision-making and intervention. In this paper, we propose MFV3DL (Monocular Vision Method for Future Vehicle 3D Localization), a dual-stream framework integrating 2D image trajectory prediction and depth prediction to achieve future vehicle 3D localization. To enhance accuracy, we leverage Multi-Object Tracking and Segmentation (MOTS) results and depth estimation as inputs for the dual-stream architecture. Additionally, we introduce a Related Information Fusion (RIF) unit to enable cross-modal interaction between the two streams. For depth stream predictions, we propose a ConvLSTM-based depth prediction method. Experimental results on the KITTI dataset demonstrate that MFV3DL outperforms state-of-the-art methods. In diverse driving scenarios, MFV3DL achieves superior 3D visualization results compared to 2D trajectory-based predictions. Baseline comparisons and ablation studies further validate that the proposed ConvLSTM-based depth prediction enhances the dual-stream architecture and RIF unit for 3D localization tasks.
期刊介绍:
The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.