{"title":"A Depthwise Separable Convolution Based 6D Pose Estimation Network by Efficient 2D-3D Feature Fusion","authors":"Qi Feng, Chaochen Gu, Jiani Qin, Rui Xu","doi":"10.1109/RCAR52367.2021.9517387","DOIUrl":null,"url":null,"abstract":"Precise 6D pose estimation of the target object is an essential prerequisite for robots to understand the real world. Previous 6D pose estimation methods based on 3D data usually have problems such as long model training time, imperfect feature extraction, redundant network model parameters, and complicated follow-up processing steps. This paper proposes a 2D-3D feature fusion module that could enhance feature extraction for the 6D pose estimation network. Furthermore, we compress the size of model parameters by adopting depthwise separable convolution to accelerate training speed and to reduce memory consumption. The experiment results on LineMOD dataset show the effectiveness of our method. Our method achieves on par or better performance than the state-of-art methods for 6D pose estimation and reduces model training time and the number of model parameters simultaneously.","PeriodicalId":232892,"journal":{"name":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR52367.2021.9517387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Precise 6D pose estimation of the target object is an essential prerequisite for robots to understand the real world. Previous 6D pose estimation methods based on 3D data usually have problems such as long model training time, imperfect feature extraction, redundant network model parameters, and complicated follow-up processing steps. This paper proposes a 2D-3D feature fusion module that could enhance feature extraction for the 6D pose estimation network. Furthermore, we compress the size of model parameters by adopting depthwise separable convolution to accelerate training speed and to reduce memory consumption. The experiment results on LineMOD dataset show the effectiveness of our method. Our method achieves on par or better performance than the state-of-art methods for 6D pose estimation and reduces model training time and the number of model parameters simultaneously.