MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

Conference on Robot Learning Pub Date : 2022-12-13 DOI:10.48550/arXiv.2212.06870

Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic

{"title":"MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare","authors":"Yann Labb'e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, D. Fox, Josef Sivic","doi":"10.48550/arXiv.2212.06870","DOIUrl":null,"url":null,"abstract":"We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.06870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.

查看原文本刊更多论文

MegaPose:通过渲染和比较的新对象的6D姿态估计

我们引入MegaPose，一种估算新物体(即训练中未见过的物体)6D姿态的方法。在推理时，该方法仅假设(i)图像中显示对象的感兴趣区域和(ii)观察对象的CAD模型的知识。这项工作的贡献是三重的。首先，我们提出了一种基于渲染和比较策略的6D姿态精细器，可以应用于新对象。通过绘制物体CAD模型的多个综合视图，将新物体的形状和坐标系作为输入提供给网络。其次，我们引入了一种新的粗糙姿态估计方法，该方法利用一个训练好的网络来分类同一物体的合成渲染和观察图像之间的姿态误差是否可以被微调器校正。第三，我们引入了一个由数千个具有不同视觉和形状属性的物体的逼真图像组成的大规模合成数据集，并表明这种多样性对于在新物体上获得良好的泛化性能至关重要。我们在这个大型合成数据集上训练我们的方法，并将其应用于来自几个姿态估计基准的真实图像中的数百个新对象，而无需重新训练。我们的方法在ModelNet和YCB-Video数据集上实现了最先进的性能。对防喷器挑战的7个核心数据集的广泛评估表明，我们的方法在性能上与需要在训练期间访问目标对象的现有方法相比具有竞争力。代码、数据集和训练模型可在项目页面上获得:https://megapose6d.github.io/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量