PrimitivePose: 3D Bounding Box Prediction of Unseen Objects via Synthetic Geometric Primitives

A. Kriegler, Csaba Beleznai, Markus Murschitz, Kai Göbel, M. Gelautz
{"title":"PrimitivePose: 3D Bounding Box Prediction of Unseen Objects via Synthetic Geometric Primitives","authors":"A. Kriegler, Csaba Beleznai, Markus Murschitz, Kai Göbel, M. Gelautz","doi":"10.1109/IRC55401.2022.00040","DOIUrl":null,"url":null,"abstract":"This paper studies the challenging problem of 3D pose and size estimation for multi-object scene configurations from stereo views. Most existing methods rely on CAD models and are therefore limited to a predefined set of known object categories. This closed-set constraint limits the range of applications for robots interacting in dynamic environments where previously unseen objects may appear. To address this problem we propose an oriented 3D bounding box detection method that does not require 3D models or semantic information of the objects and is learned entirely from the category-specific domain, relying on purely geometric cues. These geometric cues are objectness and compactness, as represented in the synthetic domain by generating a diverse set of stereo image pairs featuring pose annotated geometric primitives. We then use stereo matching and derive three representations for 3D image content: disparity maps, surface normal images and a novel representation of disparity-scaled surface normal images. The proposed model, PrimitivePose, is trained as a single-stage multi-task neural network using any one of those representations as input and 3D oriented bounding boxes, object centroids and object sizes as output. We evaluate PrimitivePose for 3D bounding box prediction on difficult unseen objects in a tabletop environment and compare it to the popular PoseCNN model-a video showcasing our results can be found at: https://preview.tinyurl.com/2pccumvt.","PeriodicalId":282759,"journal":{"name":"2022 Sixth IEEE International Conference on Robotic Computing (IRC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Sixth IEEE International Conference on Robotic Computing (IRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRC55401.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper studies the challenging problem of 3D pose and size estimation for multi-object scene configurations from stereo views. Most existing methods rely on CAD models and are therefore limited to a predefined set of known object categories. This closed-set constraint limits the range of applications for robots interacting in dynamic environments where previously unseen objects may appear. To address this problem we propose an oriented 3D bounding box detection method that does not require 3D models or semantic information of the objects and is learned entirely from the category-specific domain, relying on purely geometric cues. These geometric cues are objectness and compactness, as represented in the synthetic domain by generating a diverse set of stereo image pairs featuring pose annotated geometric primitives. We then use stereo matching and derive three representations for 3D image content: disparity maps, surface normal images and a novel representation of disparity-scaled surface normal images. The proposed model, PrimitivePose, is trained as a single-stage multi-task neural network using any one of those representations as input and 3D oriented bounding boxes, object centroids and object sizes as output. We evaluate PrimitivePose for 3D bounding box prediction on difficult unseen objects in a tabletop environment and compare it to the popular PoseCNN model-a video showcasing our results can be found at: https://preview.tinyurl.com/2pccumvt.
PrimitivePose:通过合成几何基元预测未见物体的3D边界框
本文研究了基于立体视图的多目标场景构型的三维姿态和尺寸估计问题。大多数现有的方法依赖于CAD模型,因此局限于一组预定义的已知对象类别。这种封闭集约束限制了机器人在动态环境中交互的应用范围,在动态环境中,以前看不见的物体可能会出现。为了解决这个问题,我们提出了一种定向的3D边界盒检测方法,该方法不需要3D模型或对象的语义信息,完全从特定类别的领域学习,依赖纯粹的几何线索。这些几何线索是客观性和紧凑性,在合成域中通过生成一组不同的立体图像对来表示,这些图像对具有姿态注释的几何原语。然后,我们使用立体匹配并推导出三维图像内容的三种表示:视差图、表面法线图像和一种新的视差尺度表面法线图像表示。所提出的模型PrimitivePose被训练成一个单阶段多任务神经网络,使用这些表示中的任何一个作为输入,并将面向3D的边界框、物体质心和物体尺寸作为输出。我们评估了PrimitivePose在桌面环境中对难以看不见的物体进行3D边界框预测的能力,并将其与流行的possecnn模型进行了比较——可以在https://preview.tinyurl.com/2pccumvt上找到展示我们结果的视频。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信