Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots

2019 European Conference on Mobile Robots (ECMR) Pub Date : 2019-08-22 DOI:10.1109/ECMR.2019.8870927

Dinh-Cuong Hoang, Todor Stoyanov, A. Lilienthal

{"title":"Object-RPE: Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks for Warehouse Robots","authors":"Dinh-Cuong Hoang, Todor Stoyanov, A. Lilienthal","doi":"10.1109/ECMR.2019.8870927","DOIUrl":null,"url":null,"abstract":"We present a system for accurate 3D instance-aware semantic reconstruction and 6D pose estimation, using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. The method presented in this paper extends a high-quality instance-aware semantic 3D Mapping system from previous work [1] by adding a 6D object pose estimator. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through experimental validation on the YCB-Video dataset and a newly collected warehouse object dataset. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.","PeriodicalId":435630,"journal":{"name":"2019 European Conference on Mobile Robots (ECMR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 European Conference on Mobile Robots (ECMR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECMR.2019.8870927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

We present a system for accurate 3D instance-aware semantic reconstruction and 6D pose estimation, using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. The method presented in this paper extends a high-quality instance-aware semantic 3D Mapping system from previous work [1] by adding a 6D object pose estimator. While the main trend in CNN-based 6D pose estimation has been to infer object's position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses. The developed method has been verified through experimental validation on the YCB-Video dataset and a newly collected warehouse object dataset. Experimental results confirmed that the proposed system achieves improvements over state-of-the-art methods in terms of surface reconstruction and object pose prediction. Our code and video are available at https://sites.google.com/view/object-rpe.

查看原文本刊更多论文

目标- rpe:基于卷积神经网络的仓库机器人密集三维重建和姿态估计

我们提出了一个使用RGB-D相机进行精确的3D实例感知语义重建和6D姿态估计的系统。我们的框架结合了卷积神经网络(cnn)和最先进的密集同步定位和映射(SLAM)系统ElasticFusion，以实现高质量的语义重建以及相关对象的鲁棒6D姿态估计。本文提出的方法通过增加一个6D对象姿态估计器，对先前工作[1]的高质量实例感知语义三维映射系统进行了扩展。虽然基于cnn的6D姿态估计的主要趋势是从场景的单个视图推断物体的位置和方向，但我们的方法探索了从多个视点进行姿态估计，并推测结合多个预测可以提高目标检测系统的鲁棒性。由此产生的系统能够对房间大小的环境产生高质量的物体感知语义重建，并准确检测物体及其6D姿势。在YCB-Video数据集和新采集的仓库对象数据集上进行了实验验证。实验结果证实，该系统在表面重建和目标姿态预测方面取得了较先进方法的改进。我们的代码和视频可以在https://sites.google.com/view/object-rpe上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 European Conference on Mobile Robots (ECMR)

自引率

0.00%

发文量