{"title":"SE-UF-PVNet:用于6DoF姿态估计的结构增强逐像素联合向量场投票网络","authors":"Yuqing Huang, Kefeng Wu, Fujun Sun, ChaoQuan Cai","doi":"10.1145/3603781.3603859","DOIUrl":null,"url":null,"abstract":"This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"72 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SE-UF-PVNet: A Structure Enhanced Pixel-wise Union vector Fields Voting Network for 6DoF Pose Estimation\",\"authors\":\"Yuqing Huang, Kefeng Wu, Fujun Sun, ChaoQuan Cai\",\"doi\":\"10.1145/3603781.3603859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"72 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SE-UF-PVNet: A Structure Enhanced Pixel-wise Union vector Fields Voting Network for 6DoF Pose Estimation
This paper focuses on addressing the problem of 6DoF object pose estimation with a known 3D model from a single RGB image. Some recent works have shown that structure information is effective for 6DoF pose estimation but they do not make full use. We propose SE-UF-PVNet, a more explicit, flexible, and powerful framework to introduce structure information. We construct a keypoint graph in the object coordinate system, introduce a Graph Convolution Network module to extract structure features from the keypoint graph, and concatenate them with features extracted from RGB images by the keypoints regressing network at pixel-wise. To make the estimation more robust, we predict direction vector fields and distance vector fields concurrently, propose a modified pixel-wise voting based keypoint localization algorithm on distance vector fields and further propose an algorithm based on union vector fields. Additionally, we add an Atrous Spatial Pyramid Pooling module to enhance the multi-scale feature sensing capability. Experiment results show that our method achieves 91.88 average ADD (-S) accuracy on Linemod dataset, which is the best among existing pixel-wise voting based methods. Similarly, our method achieves 49.01 average ADD (-S) accuracy on Occlusion Linemod dataset, which is the state-of-the-art among all compared methods without pose refinement.