SL3D -基于RGB-D图像的单视3D目标检测

G. Erabati, Helder Araújo
{"title":"SL3D -基于RGB-D图像的单视3D目标检测","authors":"G. Erabati, Helder Araújo","doi":"10.1109/DICTA51227.2020.9363404","DOIUrl":null,"url":null,"abstract":"We present SL3D, Single Look 3D object detection approach to detect the 3D objects from the RGB-D image pair. The approach is a proposal free, single-stage 3D object detection method from RGB-D images by leveraging multi-scale feature fusion of RGB and depth feature maps, and multi-layer predictions. The method takes pair of RGB and depth images as an input and outputs predicted 3D bounding boxes. The neural network SL3D, comprises of two modules: multi-scale feature fusion and multi-layer prediction. The multi-scale feature fusion module fuses the multi-scale features from RGB and depth feature maps, which are later used by the multi-layer prediction module for 3D object detection. Each location of prediction layer is attached with a set of predefined 3D prior boxes to account for varying shapes of 3D objects. The output of the network regresses the predicted 3D bounding boxes as an offset to the set of 3D prior boxes and duplicate 3D bounding boxes are removed by applying 3D non-maximum suppression. The network is trained end-to-end on publicly available SUN RGB-D dataset. The SL3D approach with ResNeXt50 achieves 31.77 mAP on SUN RGB-D test dataset with an inference speed of approximately 4 fps, and with MobileNetV2, it achieves approximately 15 fps with a reduction of around 2 mAP. The quantitative results show that the proposed method achieves competitive performance to state-of-the-art methods on SUN RGB-D dataset with near real-time inference speed.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SL3D - Single Look 3D Object Detection based on RGB-D Images\",\"authors\":\"G. Erabati, Helder Araújo\",\"doi\":\"10.1109/DICTA51227.2020.9363404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present SL3D, Single Look 3D object detection approach to detect the 3D objects from the RGB-D image pair. The approach is a proposal free, single-stage 3D object detection method from RGB-D images by leveraging multi-scale feature fusion of RGB and depth feature maps, and multi-layer predictions. The method takes pair of RGB and depth images as an input and outputs predicted 3D bounding boxes. The neural network SL3D, comprises of two modules: multi-scale feature fusion and multi-layer prediction. The multi-scale feature fusion module fuses the multi-scale features from RGB and depth feature maps, which are later used by the multi-layer prediction module for 3D object detection. Each location of prediction layer is attached with a set of predefined 3D prior boxes to account for varying shapes of 3D objects. The output of the network regresses the predicted 3D bounding boxes as an offset to the set of 3D prior boxes and duplicate 3D bounding boxes are removed by applying 3D non-maximum suppression. The network is trained end-to-end on publicly available SUN RGB-D dataset. The SL3D approach with ResNeXt50 achieves 31.77 mAP on SUN RGB-D test dataset with an inference speed of approximately 4 fps, and with MobileNetV2, it achieves approximately 15 fps with a reduction of around 2 mAP. The quantitative results show that the proposed method achieves competitive performance to state-of-the-art methods on SUN RGB-D dataset with near real-time inference speed.\",\"PeriodicalId\":348164,\"journal\":{\"name\":\"2020 Digital Image Computing: Techniques and Applications (DICTA)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Digital Image Computing: Techniques and Applications (DICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DICTA51227.2020.9363404\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA51227.2020.9363404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了SL3D (Single Look 3D object detection)方法来检测RGB-D图像对中的3D物体。该方法利用RGB和深度特征图的多尺度特征融合以及多层预测,是一种基于RGB- d图像的无提案单阶段3D目标检测方法。该方法以RGB和深度图像对作为预测三维边界框的输入和输出。神经网络SL3D包括两个模块:多尺度特征融合和多层预测。多尺度特征融合模块融合来自RGB和深度特征图的多尺度特征,然后由多层预测模块用于3D目标检测。预测层的每个位置都附加了一组预定义的3D先验框,以考虑3D物体的不同形状。该网络的输出将预测的3D边界框作为对3D先验框集的偏移量进行回归,并通过应用3D非最大抑制去除重复的3D边界框。该网络在公开可用的SUN RGB-D数据集上进行端到端训练。使用ResNeXt50的SL3D方法在SUN RGB-D测试数据集上实现了31.77 mAP,推理速度约为4 fps,使用MobileNetV2的SL3D方法实现了约15 fps,减少了约2 mAP。定量结果表明,该方法在SUN RGB-D数据集上的推理速度接近实时,达到了与现有方法相媲美的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SL3D - Single Look 3D Object Detection based on RGB-D Images
We present SL3D, Single Look 3D object detection approach to detect the 3D objects from the RGB-D image pair. The approach is a proposal free, single-stage 3D object detection method from RGB-D images by leveraging multi-scale feature fusion of RGB and depth feature maps, and multi-layer predictions. The method takes pair of RGB and depth images as an input and outputs predicted 3D bounding boxes. The neural network SL3D, comprises of two modules: multi-scale feature fusion and multi-layer prediction. The multi-scale feature fusion module fuses the multi-scale features from RGB and depth feature maps, which are later used by the multi-layer prediction module for 3D object detection. Each location of prediction layer is attached with a set of predefined 3D prior boxes to account for varying shapes of 3D objects. The output of the network regresses the predicted 3D bounding boxes as an offset to the set of 3D prior boxes and duplicate 3D bounding boxes are removed by applying 3D non-maximum suppression. The network is trained end-to-end on publicly available SUN RGB-D dataset. The SL3D approach with ResNeXt50 achieves 31.77 mAP on SUN RGB-D test dataset with an inference speed of approximately 4 fps, and with MobileNetV2, it achieves approximately 15 fps with a reduction of around 2 mAP. The quantitative results show that the proposed method achieves competitive performance to state-of-the-art methods on SUN RGB-D dataset with near real-time inference speed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信