摄像机重定位的多输出学习

2014 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2014-06-23 DOI:10.1109/CVPR.2014.146

Abner Guzmán-Rivera, Pushmeet Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, S. Izadi

{"title":"摄像机重定位的多输出学习","authors":"Abner Guzmán-Rivera, Pushmeet Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, S. Izadi","doi":"10.1109/CVPR.2014.146","DOIUrl":null,"url":null,"abstract":"We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex optimization problem with many local optima. We propose a hybrid discriminative-generative learning architecture that consists of: (i) a set of M predictors which generate M camera pose hypotheses, and (ii) a 'selector' or 'aggregator' that infers the best pose from the multiple pose hypotheses based on a similarity function. We are interested in predictors that not only produce good hypotheses but also hypotheses that are different from each other. Thus, we propose and study methods for learning 'marginally relevant' predictors, and compare their performance when used with different selection procedures. We evaluate our method on a recently released 3D reconstruction dataset with challenging camera poses, and scene variability. Experiments show that our method learns to make multiple predictions that are marginally relevant and can effectively select an accurate prediction. Furthermore, our method outperforms the state-of-the-art discriminative approach for camera relocalization.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":"{\"title\":\"Multi-output Learning for Camera Relocalization\",\"authors\":\"Abner Guzmán-Rivera, Pushmeet Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, S. Izadi\",\"doi\":\"10.1109/CVPR.2014.146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex optimization problem with many local optima. We propose a hybrid discriminative-generative learning architecture that consists of: (i) a set of M predictors which generate M camera pose hypotheses, and (ii) a 'selector' or 'aggregator' that infers the best pose from the multiple pose hypotheses based on a similarity function. We are interested in predictors that not only produce good hypotheses but also hypotheses that are different from each other. Thus, we propose and study methods for learning 'marginally relevant' predictors, and compare their performance when used with different selection procedures. We evaluate our method on a recently released 3D reconstruction dataset with challenging camera poses, and scene variability. Experiments show that our method learns to make multiple predictions that are marginally relevant and can effectively select an accurate prediction. Furthermore, our method outperforms the state-of-the-art discriminative approach for camera relocalization.\",\"PeriodicalId\":319578,\"journal\":{\"name\":\"2014 IEEE Conference on Computer Vision and Pattern Recognition\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"98\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2014.146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2014.146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 98

摘要

我们解决了从单个RGB-D帧中估计相对于已知3D场景的相机时代的姿势的问题。我们将这个问题表述为生成渲染过程的反转，也就是说，我们想要找到与观察到的输入最相似的3D场景模型渲染对应的相机姿势。这是一个具有许多局部最优解的非凸优化问题。我们提出了一种混合判别生成学习架构，它包括:(i)一组M个预测器，它生成M个相机姿势假设，以及(ii)一个“选择器”或“聚合器”，它根据相似函数从多个姿势假设中推断出最佳姿势。我们感兴趣的预测器不仅能产生好的假设，还能产生彼此不同的假设。因此，我们提出并研究了学习“边际相关”预测因子的方法，并比较了它们在不同选择过程中的表现。我们在最近发布的具有挑战性的相机姿势和场景可变性的3D重建数据集上评估我们的方法。实验表明，我们的方法可以学习做出多个边际相关的预测，并能有效地选择一个准确的预测。此外，我们的方法优于最先进的相机重定位判别方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-output Learning for Camera Relocalization

We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex optimization problem with many local optima. We propose a hybrid discriminative-generative learning architecture that consists of: (i) a set of M predictors which generate M camera pose hypotheses, and (ii) a 'selector' or 'aggregator' that infers the best pose from the multiple pose hypotheses based on a similarity function. We are interested in predictors that not only produce good hypotheses but also hypotheses that are different from each other. Thus, we propose and study methods for learning 'marginally relevant' predictors, and compare their performance when used with different selection procedures. We evaluate our method on a recently released 3D reconstruction dataset with challenging camera poses, and scene variability. Experiments show that our method learns to make multiple predictions that are marginally relevant and can effectively select an accurate prediction. Furthermore, our method outperforms the state-of-the-art discriminative approach for camera relocalization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量