基于激光雷达的车辆-基础设施合作端到端时间感知

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2025-03-18 DOI:10.1109/JIOT.2025.3552526

Zhenwei Yang;Jilei Mao;Wenxian Yang;Yibo Ai;Yu Kong;Haibao Yu;Weidong Zhang

{"title":"基于激光雷达的车辆-基础设施合作端到端时间感知","authors":"Zhenwei Yang;Jilei Mao;Wenxian Yang;Yibo Ai;Yu Kong;Haibao Yu;Weidong Zhang","doi":"10.1109/JIOT.2025.3552526","DOIUrl":null,"url":null,"abstract":"Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 13","pages":"22862-22874"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiDAR-Based End-to-End Temporal Perception for Vehicle-Infrastructure Cooperation\",\"authors\":\"Zhenwei Yang;Jilei Mao;Wenxian Yang;Yibo Ai;Yu Kong;Haibao Yu;Weidong Zhang\",\"doi\":\"10.1109/JIOT.2025.3552526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 13\",\"pages\":\"22862-22874\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10930920/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930920/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

时间感知被定义为跨越时间序列检测和跟踪物体的能力，是自动驾驶系统的基本组成部分。由于物体遮挡和固有盲点导致的感知不完整，单个车辆感知系统存在局限性，而协作感知系统在传感器校准精度和定位精度方面也存在挑战。为了解决这些问题，我们引入了LET-VIC，一个基于激光雷达的车辆-基础设施合作（VIC）端到端跟踪框架。首先，我们采用时间自注意和VIC交叉注意模块，从车辆和基础设施的角度有效整合时空信息。然后，我们开发了一种新的校准误差补偿（CEC）模块，以减轻传感器不对准问题，并促进准确的特征对准。在vehicle-to-everything-Seq-SPD数据集上的实验表明，LET-VIC显著优于基线模型。与LET-V相比，LET-VIC的平均精度（mAP）提高了+15.0%，平均多目标跟踪精度（AMOTA）提高了+17.3%。此外，LET-VIC超过了代表性的检测跟踪模型，包括V2VNet， FFNet和PointPillars，在不考虑通信延迟的情况下，mAP至少提高了13.7%，AMOTA提高了13.1%，显示了其鲁棒的检测和跟踪性能。实验表明，将多视角、时间序列或CEC集成到端到端训练中可以显著提高检测和跟踪性能。所有代码都将是开源的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LiDAR-Based End-to-End Temporal Perception for Vehicle-Infrastructure Cooperation

Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.