{"title":"A Geometric Convolutional Neural Network for 3D Object Detection","authors":"Yawen Lu, Qianyu Guo, G. Lu","doi":"10.1109/GlobalSIP45357.2019.8969077","DOIUrl":null,"url":null,"abstract":"We propose a method for accurate 3D vehicle detection based on geometric deep neural networks. From only a single RGB image, the framework is able to recover the 3D positions and predict 3D bounding boxes. In particular, the algorithm leverages single image depth estimation and semantic segmentation to produce 3D point cloud for specific objects. By geometrically constraining the object dimensions, an accurate and stable 3D bounding box which tightly fits into the real object can be estimated. We verify the effectiveness and robustness of our method by comparing with other recent state-of-art methods on the challenging KITTI 3D benchmark dataset as well as synthetic Virtual KITTI dataset. Without requiring ground truth 3D labels, our method is able to produce competitive and robust performance in 3D scene understanding and detection.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We propose a method for accurate 3D vehicle detection based on geometric deep neural networks. From only a single RGB image, the framework is able to recover the 3D positions and predict 3D bounding boxes. In particular, the algorithm leverages single image depth estimation and semantic segmentation to produce 3D point cloud for specific objects. By geometrically constraining the object dimensions, an accurate and stable 3D bounding box which tightly fits into the real object can be estimated. We verify the effectiveness and robustness of our method by comparing with other recent state-of-art methods on the challenging KITTI 3D benchmark dataset as well as synthetic Virtual KITTI dataset. Without requiring ground truth 3D labels, our method is able to produce competitive and robust performance in 3D scene understanding and detection.