SE-UF-PVNet:用于6DoF姿态估计的结构增强逐像素联合向量场投票网络

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things Pub Date : 2023-05-26 DOI:10.1145/3603781.3603859

Yuqing Huang, Kefeng Wu, Fujun Sun, ChaoQuan Cai

{"title":"SE-UF-PVNet:用于6DoF姿态估计的结构增强逐像素联合向量场投票网络","authors":"Yuqing Huang, Kefeng Wu, Fujun Sun, ChaoQuan Cai","doi":"10.1145/3603781.3603859","DOIUrl":null,"url":null,"abstract":"This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"72 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SE-UF-PVNet: A Structure Enhanced Pixel-wise Union vector Fields Voting Network for 6DoF Pose Estimation\",\"authors\":\"Yuqing Huang, Kefeng Wu, Fujun Sun, ChaoQuan Cai\",\"doi\":\"10.1145/3603781.3603859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"72 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文主要研究了利用已知的RGB图像的三维模型进行6DoF物体姿态估计的问题。近年来的一些研究表明，结构信息对六自由度姿态估计是有效的，但没有得到充分利用。我们提出了一个更加明确、灵活和强大的框架SE-UF-PVNet来引入结构信息。我们在目标坐标系中构造关键点图，引入图卷积网络模块从关键点图中提取结构特征，并将其与关键点回归网络从RGB图像中提取的特征按像素级进行拼接。为了提高估计的鲁棒性，我们同时预测方向向量场和距离向量场，提出了一种改进的基于像素投票的距离向量场关键点定位算法，并进一步提出了一种基于联合向量场的关键点定位算法。此外，我们增加了一个空间金字塔池模块，以提高多尺度特征感知能力。实验结果表明，该方法在Linemod数据集上的平均ADD (-S)准确率为91.88，是现有基于像素的投票方法中准确率最高的。同样，我们的方法在Occlusion Linemod数据集上达到了49.01的平均ADD (-S)精度，在所有没有姿态优化的比较方法中是最先进的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SE-UF-PVNet: A Structure Enhanced Pixel-wise Union vector Fields Voting Network for 6DoF Pose Estimation

This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

自引率

0.00%

发文量