Yongchang Zhang, Yue Guo, Hanbing Niu, Bo Zhang, Yun Cao, Wenhao He
{"title":"SimpleFusion: 3D Object Detection by Fusing RGB Images and Point Clouds","authors":"Yongchang Zhang, Yue Guo, Hanbing Niu, Bo Zhang, Yun Cao, Wenhao He","doi":"10.1109/prmvia58252.2023.00014","DOIUrl":null,"url":null,"abstract":"Achieving robust 3D object detection by fusing images and point clouds remains challenging. In this paper, we propose a novel 3D object detector (SimpleFusion) that enables simple and efficient multi-sensor fusion. Our main motivation is to boost feature extraction from a single modality and fuse them into a unified space. Specifically, we build a new visual 3D object detector in the camera stream that leverages point cloud supervision for more accurate depth prediction; in the lidar stream, we introduce a robust 3D object detector that utilizes multi-view and multi-scale features to overcome the sparsity of point clouds. Finally, we propose a dynamic fusion module to focus on more confident features and achieve accurate 3D object detection based on dynamic weights. Our method has been evaluated on the nuScenes dataset, and the experimental results indicate that it outperforms other state-of-the-art methods by a significant margin.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/prmvia58252.2023.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Achieving robust 3D object detection by fusing images and point clouds remains challenging. In this paper, we propose a novel 3D object detector (SimpleFusion) that enables simple and efficient multi-sensor fusion. Our main motivation is to boost feature extraction from a single modality and fuse them into a unified space. Specifically, we build a new visual 3D object detector in the camera stream that leverages point cloud supervision for more accurate depth prediction; in the lidar stream, we introduce a robust 3D object detector that utilizes multi-view and multi-scale features to overcome the sparsity of point clouds. Finally, we propose a dynamic fusion module to focus on more confident features and achieve accurate 3D object detection based on dynamic weights. Our method has been evaluated on the nuScenes dataset, and the experimental results indicate that it outperforms other state-of-the-art methods by a significant margin.