VirPNet:用于 3D 物体检测的多模态虚拟点生成网络

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Lin Wang;Shiliang Sun;Jing Zhao
{"title":"VirPNet:用于 3D 物体检测的多模态虚拟点生成网络","authors":"Lin Wang;Shiliang Sun;Jing Zhao","doi":"10.1109/TMM.2024.3410117","DOIUrl":null,"url":null,"abstract":"LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10597-10609"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection\",\"authors\":\"Lin Wang;Shiliang Sun;Jing Zhao\",\"doi\":\"10.1109/TMM.2024.3410117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"10597-10609\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10549844/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10549844/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

激光雷达和摄像头是自动驾驶中感知道路场景最常用的传感器。目前的方法试图融合这两种互补信息来增强三维物体检测。然而,多模态三维物体检测仍存在两个棘手问题。一个是稀疏点云的物体检测问题。另一个问题是固定物理位置造成的不同传感器之间的错位。因此,本文认为,在多模态三维物体检测中,明确融合两种模态的信息以及物理错位是次优的。本文提出了一种新型虚拟点生成网络 VirPNet,以克服多模态融合的挑战。一方面,它完成了来自图像源的稀疏点云对象,提高了最终检测精度。另一方面,它直接从原始点云中检测三维目标,避免了激光雷达和相机传感器之间的物理错位。与以往的点云补全方法不同,VirPNet 充分利用了像素和点云的几何信息,通过虚拟平面将三维点云回归简化为二维距离回归问题。在 KITTI 3D 物体检测数据集和 nuScenes 数据集上的实验结果表明,VirPNet 借助生成的虚拟点提高了检测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信