{"title":"基于卷积神经网络的点云和图像融合目标检测","authors":"Zhang Jiesong, Huang Yingping, Zhang Rui","doi":"10.12086/OEE.2021.200325","DOIUrl":null,"url":null,"abstract":"Addressing on the issues like varying object scale, complicated illumination conditions, and lack of reliable distance information in driverless applications, this paper proposes a multi-modal fusion method for object detection by using convolutional neural networks. The depth map is generated by mapping LiDAR point cloud onto the image plane and taken as input data together with the RGB image. The input data is also processed by the sliding window to reduce information loss. Two feature extracting networks are used to extract features of the image and the depth map respectively. The generated feature maps are fused through a connection layer. The objects are detected by processing the fused feature map through position regression and object classification. Non-maximal suppression is used to optimize the detection results. The experimental results on the KITTI dataset show that the proposed method is robust in various illumination conditions and especially effective on detecting small objects. Compared with other methods, the proposed method exhibits integrated advantages in terms of detection accuracy and speed.","PeriodicalId":39552,"journal":{"name":"光电工程","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fusing point cloud with image for object detection using convolutional neural networks\",\"authors\":\"Zhang Jiesong, Huang Yingping, Zhang Rui\",\"doi\":\"10.12086/OEE.2021.200325\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Addressing on the issues like varying object scale, complicated illumination conditions, and lack of reliable distance information in driverless applications, this paper proposes a multi-modal fusion method for object detection by using convolutional neural networks. The depth map is generated by mapping LiDAR point cloud onto the image plane and taken as input data together with the RGB image. The input data is also processed by the sliding window to reduce information loss. Two feature extracting networks are used to extract features of the image and the depth map respectively. The generated feature maps are fused through a connection layer. The objects are detected by processing the fused feature map through position regression and object classification. Non-maximal suppression is used to optimize the detection results. The experimental results on the KITTI dataset show that the proposed method is robust in various illumination conditions and especially effective on detecting small objects. Compared with other methods, the proposed method exhibits integrated advantages in terms of detection accuracy and speed.\",\"PeriodicalId\":39552,\"journal\":{\"name\":\"光电工程\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"光电工程\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.12086/OEE.2021.200325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"光电工程","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.12086/OEE.2021.200325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
Fusing point cloud with image for object detection using convolutional neural networks
Addressing on the issues like varying object scale, complicated illumination conditions, and lack of reliable distance information in driverless applications, this paper proposes a multi-modal fusion method for object detection by using convolutional neural networks. The depth map is generated by mapping LiDAR point cloud onto the image plane and taken as input data together with the RGB image. The input data is also processed by the sliding window to reduce information loss. Two feature extracting networks are used to extract features of the image and the depth map respectively. The generated feature maps are fused through a connection layer. The objects are detected by processing the fused feature map through position regression and object classification. Non-maximal suppression is used to optimize the detection results. The experimental results on the KITTI dataset show that the proposed method is robust in various illumination conditions and especially effective on detecting small objects. Compared with other methods, the proposed method exhibits integrated advantages in terms of detection accuracy and speed.