DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-11 DOI:10.48550/arXiv.2210.05232

Hongyang Li, Jiehong Lin, K. Jia

{"title":"DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation","authors":"Hongyang Li, Jiehong Lin, K. Jia","doi":"10.48550/arXiv.2210.05232","DOIUrl":null,"url":null,"abstract":". Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"38 1","pages":"369-385"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.05232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

. Establishment of point correspondence between camera and object coordinate systems is a promising way to solve 6D object poses. However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this short-coming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net . Specifically, DCL-Net employs dual newly proposed Feature Disengagement and Alignment (FDA) modules to establish, in the feature space, partial-to-partial correspondence and complete-to-complete one for partial object observation and its complete CAD model, respectively, which result in aggregated pose and match feature pairs from two coordinate systems; these two FDA modules thus bring complementary advantages. The match feature pairs are used to learn confidence scores for measuring the qualities of deep correspondence, while the pose feature pairs are weighted by confidence scores for direct object pose regression. A confidence-based pose refinement network is also proposed to further improve pose precision in an iterative manner. Extensive experiments show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD; ablation studies also confirm the efficacy of our novel designs. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net .

查看原文本刊更多论文

DCL-Net: 6D姿态估计的深度对应学习网络

．建立摄像机与目标坐标系之间的点对应关系是求解6D目标位姿的一种很有前途的方法。然而，三维空间中对应学习的替代目标与物体姿态估计的真实目标有一步之遥，使得学习对最终任务来说不是最优的。在本文中，我们通过引入一种新的用于直接6D目标姿态估计的深度对应学习网络(简称DCL-Net)方法来解决这一缺点。具体而言，DCL-Net采用新提出的双特征分离与对齐(Feature Disengagement and Alignment, FDA)模块，分别在特征空间中建立局部目标观测及其完整CAD模型的部分对部分对应关系和完全对完全对应关系，得到两个坐标系的姿态和匹配特征对聚合;这两个FDA模块因此带来了互补的优势。匹配特征对学习置信度分数用于测量深度对应的质量，姿态特征对加权置信度分数用于直接目标姿态回归。提出了一种基于置信度的姿态优化网络，以迭代的方式进一步提高姿态精度。大量实验表明，DCL-Net在YCB-Video、LineMOD和oclusion -LineMOD三个基准数据集上优于现有方法;消融研究也证实了我们的新设计的有效性。我们的代码在https://github.com/Gorilla-Lab-SCUT/DCL-Net公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量