{"title":"基于激光雷达的车辆-基础设施合作端到端时间感知","authors":"Zhenwei Yang;Jilei Mao;Wenxian Yang;Yibo Ai;Yu Kong;Haibao Yu;Weidong Zhang","doi":"10.1109/JIOT.2025.3552526","DOIUrl":null,"url":null,"abstract":"Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 13","pages":"22862-22874"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiDAR-Based End-to-End Temporal Perception for Vehicle-Infrastructure Cooperation\",\"authors\":\"Zhenwei Yang;Jilei Mao;Wenxian Yang;Yibo Ai;Yu Kong;Haibao Yu;Weidong Zhang\",\"doi\":\"10.1109/JIOT.2025.3552526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 13\",\"pages\":\"22862-22874\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10930920/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930920/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
LiDAR-Based End-to-End Temporal Perception for Vehicle-Infrastructure Cooperation
Temporal perception, defined as the capability to detect and track objects across temporal sequences, serves as a fundamental component in autonomous driving systems. While single-vehicle perception systems encounter limitations, stemming from incomplete perception due to object occlusion and inherent blind spots, cooperative perception systems present their own challenges in terms of sensor calibration precision and positioning accuracy. To address these issues, we introduce LET-VIC, a LiDAR-based End-to-End Tracking framework for vehicle-infrastructure cooperation (VIC). First, we employ Temporal Self-Attention and VIC cross-attention modules to effectively integrate temporal and spatial information from both vehicle and infrastructure perspectives. Then, we develop a novel calibration error compensation (CEC) module to mitigate sensor misalignment issues and facilitate accurate feature alignment. Experiments on the vehicle-to-everything-Seq-SPD dataset demonstrate that LET-VIC significantly outperforms baseline models. Compared to LET-V, LET-VIC achieves +15.0% improvement in mean average precision (mAP) and a +17.3% improvement in average multiobject tracking accuracy (AMOTA). Furthermore, LET-VIC surpasses representative Tracking by Detection models, including V2VNet, FFNet, and PointPillars, with at least a +13.7% improvement in mAP and a +13.1% improvement in AMOTA without considering communication delays, showcasing its robust detection and tracking performance. The experiments demonstrate that the integration of multiview perspectives, temporal sequences, or CEC in end-to-end training significantly improves both detection and tracking performance. All code will be open-sourced.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.