从自然数据中学习特定类重构的层次模型

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2018-06-01 DOI:10.1109/CVPRW.2018.00153

Arun C. S. Kumar, S. Bhandarkar, Mukta Prasad

{"title":"从自然数据中学习特定类重构的层次模型","authors":"Arun C. S. Kumar, S. Bhandarkar, Mukta Prasad","doi":"10.1109/CVPRW.2018.00153","DOIUrl":null,"url":null,"abstract":"We propose a novel method for class-specific, single-view, object detection, pose estimation and deformable 3D reconstruction, where a two-pronged (sparse semantic and dense shape) representation is learned from natural image data automatically. Then, given a new image, it can estimate camera pose and deformable reconstruction using an effective, incremental optimization. Our method extracts a continuous, scaled-orthographic pose (without resorting to regression and/or discretized 1D azimuth-based representations). The method reconstructs a full free-form shape (rather than retrieving the closest 3D CAD shape proxy, typical in state-of-the-art). We learn our two-pronged model purely from natural image data, as automatically and faithfully as possible, reducing the human effort and bias typical to this problem. The pipeline combines data-driven deep learning based semantic part learning with principled modelling and effective optimization of the problem's physics, shape deformation, pose and occlusion. The underlying sparse (part-based) representation of the object is computationally efficient for purposes like detection and discriminative tasks, whereas the overlaid dense (skin like) representation, models and realistically renders comprehensive 3D structure including natural deformation, occlusion. The results for the car class are visually pleasing, and importantly, outperform the state-of-the-art quantitatively too. Our contribution to visual scene understanding through the two-pronged object representation shows promise for more accurate 3D scene understanding for real world applications on virtual/mixed reality, autonomous navigation, to cite a few.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Hierarchical Models for Class-Specific Reconstruction from Natural Data\",\"authors\":\"Arun C. S. Kumar, S. Bhandarkar, Mukta Prasad\",\"doi\":\"10.1109/CVPRW.2018.00153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel method for class-specific, single-view, object detection, pose estimation and deformable 3D reconstruction, where a two-pronged (sparse semantic and dense shape) representation is learned from natural image data automatically. Then, given a new image, it can estimate camera pose and deformable reconstruction using an effective, incremental optimization. Our method extracts a continuous, scaled-orthographic pose (without resorting to regression and/or discretized 1D azimuth-based representations). The method reconstructs a full free-form shape (rather than retrieving the closest 3D CAD shape proxy, typical in state-of-the-art). We learn our two-pronged model purely from natural image data, as automatically and faithfully as possible, reducing the human effort and bias typical to this problem. The pipeline combines data-driven deep learning based semantic part learning with principled modelling and effective optimization of the problem's physics, shape deformation, pose and occlusion. The underlying sparse (part-based) representation of the object is computationally efficient for purposes like detection and discriminative tasks, whereas the overlaid dense (skin like) representation, models and realistically renders comprehensive 3D structure including natural deformation, occlusion. The results for the car class are visually pleasing, and importantly, outperform the state-of-the-art quantitatively too. Our contribution to visual scene understanding through the two-pronged object representation shows promise for more accurate 3D scene understanding for real world applications on virtual/mixed reality, autonomous navigation, to cite a few.\",\"PeriodicalId\":150600,\"journal\":{\"name\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPRW.2018.00153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2018.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种针对特定类别、单视图、目标检测、姿态估计和可变形3D重建的新方法，其中从自然图像数据中自动学习两种(稀疏语义和密集形状)表示。然后，给定一个新的图像，它可以估计相机姿态和变形重建使用有效的，增量优化。我们的方法提取一个连续的、缩放的正射影位(不需要回归和/或离散的一维方位角表示)。该方法重建了一个完整的自由形状(而不是检索最接近的3D CAD形状代理，这是最先进的)。我们完全从自然图像数据中学习双管齐下的模型，尽可能自动和忠实地学习，减少了人类在这个问题上的努力和偏见。该管道将基于数据驱动的深度学习的语义部分学习与问题的物理、形状变形、姿态和遮挡的原则建模和有效优化相结合。物体的底层稀疏(基于部分的)表示在检测和判别任务等方面具有计算效率，而覆盖的密集(类似皮肤的)表示，建模并逼真地呈现了包括自然变形，遮挡在内的全面3D结构。汽车类的结果在视觉上令人愉悦，重要的是，在数量上也优于最先进的产品。我们通过双管齐下的对象表示对视觉场景理解的贡献表明，在虚拟/混合现实、自主导航等现实世界应用中，我们有望更准确地理解3D场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Hierarchical Models for Class-Specific Reconstruction from Natural Data

We propose a novel method for class-specific, single-view, object detection, pose estimation and deformable 3D reconstruction, where a two-pronged (sparse semantic and dense shape) representation is learned from natural image data automatically. Then, given a new image, it can estimate camera pose and deformable reconstruction using an effective, incremental optimization. Our method extracts a continuous, scaled-orthographic pose (without resorting to regression and/or discretized 1D azimuth-based representations). The method reconstructs a full free-form shape (rather than retrieving the closest 3D CAD shape proxy, typical in state-of-the-art). We learn our two-pronged model purely from natural image data, as automatically and faithfully as possible, reducing the human effort and bias typical to this problem. The pipeline combines data-driven deep learning based semantic part learning with principled modelling and effective optimization of the problem's physics, shape deformation, pose and occlusion. The underlying sparse (part-based) representation of the object is computationally efficient for purposes like detection and discriminative tasks, whereas the overlaid dense (skin like) representation, models and realistically renders comprehensive 3D structure including natural deformation, occlusion. The results for the car class are visually pleasing, and importantly, outperform the state-of-the-art quantitatively too. Our contribution to visual scene understanding through the two-pronged object representation shows promise for more accurate 3D scene understanding for real world applications on virtual/mixed reality, autonomous navigation, to cite a few.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量