Karna Vishnu Vardhana Reddy, D. Venkat Reddy, M. V. Nageswara Rao, T. V. V. Satyanarayana, T. Aravinda Babu
{"title":"基于雷达视觉融合的道路交通安全高效目标识别模型","authors":"Karna Vishnu Vardhana Reddy, D. Venkat Reddy, M. V. Nageswara Rao, T. V. V. Satyanarayana, T. Aravinda Babu","doi":"10.1002/ett.70156","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>It is difficult for automated driving systems, or advanced driver assistance systems, to recognize and comprehend their surroundings. This paper proposes a transformer model-based approach for road object recognition using sensor fusion. Initially, data from the camera and millimeter-wave (mmWave) radar are simultaneously acquired and pre-processed. Since direct point cloud-to-image fusion is difficult for fusion object detection models, the radar point clouds are then circularly projected onto a 2-dimensional (2D) plane. Then, both the camera image and radar projection image enter different branches of the feature extraction model, utilizing a dual-path vision transformer (DualP-ViT) to complete feature extraction and fusion. The items are recognized after going through several layers of encoders and decoders. An encoder decoder-based vision transformer (EDViT) provides accurate measures of distance and velocity. Also, the vision sensors (cameras) produce high-resolution images with rich visual information. The proposed approach is implemented on the nuScenes dataset, and the performance is evaluated based on object detection metrics. The mean Average Precision (mAP), NuScenes Detection Score (NDS), Planning KL-Divergence (PKL), accuracy, precision, recall, f1-score, and latency performance obtained with the proposed approach is 59, 68, 0.6, 80, 79, 80, 78.9, and 10 ms. In the proposed approach, the robustness and accuracy of object detection is improved.</p>\n </div>","PeriodicalId":23282,"journal":{"name":"Transactions on Emerging Telecommunications Technologies","volume":"36 5","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Efficient Target Recognition Model Based on Radar–Vision Fusion for Road Traffic Safety\",\"authors\":\"Karna Vishnu Vardhana Reddy, D. Venkat Reddy, M. V. Nageswara Rao, T. V. V. Satyanarayana, T. Aravinda Babu\",\"doi\":\"10.1002/ett.70156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>It is difficult for automated driving systems, or advanced driver assistance systems, to recognize and comprehend their surroundings. This paper proposes a transformer model-based approach for road object recognition using sensor fusion. Initially, data from the camera and millimeter-wave (mmWave) radar are simultaneously acquired and pre-processed. Since direct point cloud-to-image fusion is difficult for fusion object detection models, the radar point clouds are then circularly projected onto a 2-dimensional (2D) plane. Then, both the camera image and radar projection image enter different branches of the feature extraction model, utilizing a dual-path vision transformer (DualP-ViT) to complete feature extraction and fusion. The items are recognized after going through several layers of encoders and decoders. An encoder decoder-based vision transformer (EDViT) provides accurate measures of distance and velocity. Also, the vision sensors (cameras) produce high-resolution images with rich visual information. The proposed approach is implemented on the nuScenes dataset, and the performance is evaluated based on object detection metrics. The mean Average Precision (mAP), NuScenes Detection Score (NDS), Planning KL-Divergence (PKL), accuracy, precision, recall, f1-score, and latency performance obtained with the proposed approach is 59, 68, 0.6, 80, 79, 80, 78.9, and 10 ms. In the proposed approach, the robustness and accuracy of object detection is improved.</p>\\n </div>\",\"PeriodicalId\":23282,\"journal\":{\"name\":\"Transactions on Emerging Telecommunications Technologies\",\"volume\":\"36 5\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions on Emerging Telecommunications Technologies\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ett.70156\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on Emerging Telecommunications Technologies","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ett.70156","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
An Efficient Target Recognition Model Based on Radar–Vision Fusion for Road Traffic Safety
It is difficult for automated driving systems, or advanced driver assistance systems, to recognize and comprehend their surroundings. This paper proposes a transformer model-based approach for road object recognition using sensor fusion. Initially, data from the camera and millimeter-wave (mmWave) radar are simultaneously acquired and pre-processed. Since direct point cloud-to-image fusion is difficult for fusion object detection models, the radar point clouds are then circularly projected onto a 2-dimensional (2D) plane. Then, both the camera image and radar projection image enter different branches of the feature extraction model, utilizing a dual-path vision transformer (DualP-ViT) to complete feature extraction and fusion. The items are recognized after going through several layers of encoders and decoders. An encoder decoder-based vision transformer (EDViT) provides accurate measures of distance and velocity. Also, the vision sensors (cameras) produce high-resolution images with rich visual information. The proposed approach is implemented on the nuScenes dataset, and the performance is evaluated based on object detection metrics. The mean Average Precision (mAP), NuScenes Detection Score (NDS), Planning KL-Divergence (PKL), accuracy, precision, recall, f1-score, and latency performance obtained with the proposed approach is 59, 68, 0.6, 80, 79, 80, 78.9, and 10 ms. In the proposed approach, the robustness and accuracy of object detection is improved.
期刊介绍:
ransactions on Emerging Telecommunications Technologies (ETT), formerly known as European Transactions on Telecommunications (ETT), has the following aims:
- to attract cutting-edge publications from leading researchers and research groups around the world
- to become a highly cited source of timely research findings in emerging fields of telecommunications
- to limit revision and publication cycles to a few months and thus significantly increase attractiveness to publish
- to become the leading journal for publishing the latest developments in telecommunications