Real-time pose estimation of 3D objects from camera images using neural networks

Proceedings of International Conference on Robotics and Automation Pub Date : 1997-04-20 DOI:10.1109/ROBOT.1997.606781

P. Wunsch, S. Winkler, G. Hirzinger

{"title":"Real-time pose estimation of 3D objects from camera images using neural networks","authors":"P. Wunsch, S. Winkler, G. Hirzinger","doi":"10.1109/ROBOT.1997.606781","DOIUrl":null,"url":null,"abstract":"This paper deals with the problem of obtaining a rough estimate of three dimensional object position and orientation from a single two dimensional camera image. Such an estimate is required by most 3-D to 2-D registration and tracking methods that can efficiently refine an initial value by numerical optimization to precisely recover 3-D pose. However the analytic computation of an initial pose guess requires the solution of an extremely complex correspondence problem that is due to the large number of topologically distinct aspects that arise when a three dimensional opaque object is imaged by a camera. Hence general analytic methods fail to achieve real-time performance and most tracking and registration systems are initialized interactively or by ad hoc heuristics. To overcome these limitations we present a novel method for approximate object pose estimation that is based on a neural net and that can easily be implemented in real-time. A modification of Kohonen's self-organizing feature map is systematically trained with computer generated object views such that it responds to a preprocessed image with one or more sets of object orientation parameters. The key idea proposed here is to choose network topology in accordance with the representation of 3-D orientation. Experimental results from both simulated and real images demonstrate that a pose estimate within the accuracy requirements can be found in more than 81% of all cases. The current implementation operates at 10 Hz on real world images.","PeriodicalId":225473,"journal":{"name":"Proceedings of International Conference on Robotics and Automation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of International Conference on Robotics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBOT.1997.606781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

This paper deals with the problem of obtaining a rough estimate of three dimensional object position and orientation from a single two dimensional camera image. Such an estimate is required by most 3-D to 2-D registration and tracking methods that can efficiently refine an initial value by numerical optimization to precisely recover 3-D pose. However the analytic computation of an initial pose guess requires the solution of an extremely complex correspondence problem that is due to the large number of topologically distinct aspects that arise when a three dimensional opaque object is imaged by a camera. Hence general analytic methods fail to achieve real-time performance and most tracking and registration systems are initialized interactively or by ad hoc heuristics. To overcome these limitations we present a novel method for approximate object pose estimation that is based on a neural net and that can easily be implemented in real-time. A modification of Kohonen's self-organizing feature map is systematically trained with computer generated object views such that it responds to a preprocessed image with one or more sets of object orientation parameters. The key idea proposed here is to choose network topology in accordance with the representation of 3-D orientation. Experimental results from both simulated and real images demonstrate that a pose estimate within the accuracy requirements can be found in more than 81% of all cases. The current implementation operates at 10 Hz on real world images.

查看原文本刊更多论文

利用神经网络从相机图像中实时估计三维物体的姿态

本文研究了从单幅二维相机图像中获得三维物体位置和方向的粗略估计问题。大多数三维到二维的配准和跟踪方法都需要这样的估计，这些方法可以通过数值优化有效地细化初始值以精确地恢复三维姿态。然而，初始姿态猜测的解析计算需要解决一个极其复杂的对应问题，这是由于当一个三维不透明物体被相机成像时，会出现大量的拓扑不同方面。因此，一般的分析方法无法实现实时性能，大多数跟踪和注册系统都是通过交互或特别启发式初始化的。为了克服这些限制，我们提出了一种新的基于神经网络的近似物体姿态估计方法，该方法可以很容易地实时实现。Kohonen的自组织特征图的修改是用计算机生成的对象视图进行系统训练的，这样它就可以响应带有一组或多组对象方向参数的预处理图像。本文提出的关键思想是根据三维方向的表示来选择网络拓扑。仿真和真实图像的实验结果表明，在81%以上的情况下，都能找到符合精度要求的姿态估计。目前的实现在真实世界的图像上以10 Hz的频率运行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of International Conference on Robotics and Automation

自引率

0.00%

发文量