{"title":"基于视觉描述符的6D目标姿态估计","authors":"Qi-Wei Sun, Samuel Cheng","doi":"10.1145/3438872.3439095","DOIUrl":null,"url":null,"abstract":"One essential component for object pose estimation is to extract the objects' features with suitable representation. For symmetrical objects and smooth objects that lack texture, the pose estimation results are not satisfactory because it is difficult to extract and represent these objects' feature information. This work introduces a new method to represent objects' features by constructing pixel-level visual descriptors and performing a 6D pose estimation based on the RGB-D image. Compared with traditional RGB images, RGB-D images can provide richer information, and image descriptors constructed based on RGB-D images can extract and represent object features more effectively. We also use a network to refine the pose estimation result instead of using ICP to improve refinement speed. The proposed architecture has made satisfactory improvement on the YCB-Video dataset, especially for symmetric objects and other categories that are difficult to regress in the past.","PeriodicalId":199307,"journal":{"name":"Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"6D Object Pose Estimation by Visual Descriptor\",\"authors\":\"Qi-Wei Sun, Samuel Cheng\",\"doi\":\"10.1145/3438872.3439095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One essential component for object pose estimation is to extract the objects' features with suitable representation. For symmetrical objects and smooth objects that lack texture, the pose estimation results are not satisfactory because it is difficult to extract and represent these objects' feature information. This work introduces a new method to represent objects' features by constructing pixel-level visual descriptors and performing a 6D pose estimation based on the RGB-D image. Compared with traditional RGB images, RGB-D images can provide richer information, and image descriptors constructed based on RGB-D images can extract and represent object features more effectively. We also use a network to refine the pose estimation result instead of using ICP to improve refinement speed. The proposed architecture has made satisfactory improvement on the YCB-Video dataset, especially for symmetric objects and other categories that are difficult to regress in the past.\",\"PeriodicalId\":199307,\"journal\":{\"name\":\"Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3438872.3439095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3438872.3439095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
One essential component for object pose estimation is to extract the objects' features with suitable representation. For symmetrical objects and smooth objects that lack texture, the pose estimation results are not satisfactory because it is difficult to extract and represent these objects' feature information. This work introduces a new method to represent objects' features by constructing pixel-level visual descriptors and performing a 6D pose estimation based on the RGB-D image. Compared with traditional RGB images, RGB-D images can provide richer information, and image descriptors constructed based on RGB-D images can extract and represent object features more effectively. We also use a network to refine the pose estimation result instead of using ICP to improve refinement speed. The proposed architecture has made satisfactory improvement on the YCB-Video dataset, especially for symmetric objects and other categories that are difficult to regress in the past.