Li Wang, Ruifeng Li, Jingwen Sun, S. H. Soon, C. K. Quah, Lijun Zhao
{"title":"基于特征和卷积神经网络的视觉再定位融合方法","authors":"Li Wang, Ruifeng Li, Jingwen Sun, S. H. Soon, C. K. Quah, Lijun Zhao","doi":"10.1109/ICARCV.2018.8581204","DOIUrl":null,"url":null,"abstract":"Relocalization is one of the necessary modules for mobile robots in long-term autonomous movement in an environment. Currently, visual relocalization algorithms mainly include feature-based methods and CNN-based (Convolutional Neural Network) methods. Feature-based methods can achieve high localization accuracy in feature-rich scenes, but the error is quite large or it even fails in cases with motion blur, texture-less scene and changing view angle. CNN-based methods usually have better robustness but poor localization accuracy. For this reason, a visual relocalization algorithm that combines the advantages of the two methods is proposed in this paper. The BoVW (Bag of Visual Words) model is used to search for the most similar image in the training dataset. PnP (Perspective n Points) and RANSAC (Random Sample Consensus) are employed to estimate an initial pose. Then the number of inliers is utilized as a criterion whether the feature-based method or the CNN-based method is to be leveraged. Compared with a previous CNN-based method, PoseNet, the average position error is reduced by 45.6% and the average orientation error is reduced by 67.4% on Microsoft's 7-Scenes datasets, which verifies the effectiveness of the proposed algorithm.","PeriodicalId":395380,"journal":{"name":"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Feature-Based and Convolutional Neural Network Fusion Method for Visual Relocalization\",\"authors\":\"Li Wang, Ruifeng Li, Jingwen Sun, S. H. Soon, C. K. Quah, Lijun Zhao\",\"doi\":\"10.1109/ICARCV.2018.8581204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relocalization is one of the necessary modules for mobile robots in long-term autonomous movement in an environment. Currently, visual relocalization algorithms mainly include feature-based methods and CNN-based (Convolutional Neural Network) methods. Feature-based methods can achieve high localization accuracy in feature-rich scenes, but the error is quite large or it even fails in cases with motion blur, texture-less scene and changing view angle. CNN-based methods usually have better robustness but poor localization accuracy. For this reason, a visual relocalization algorithm that combines the advantages of the two methods is proposed in this paper. The BoVW (Bag of Visual Words) model is used to search for the most similar image in the training dataset. PnP (Perspective n Points) and RANSAC (Random Sample Consensus) are employed to estimate an initial pose. Then the number of inliers is utilized as a criterion whether the feature-based method or the CNN-based method is to be leveraged. Compared with a previous CNN-based method, PoseNet, the average position error is reduced by 45.6% and the average orientation error is reduced by 67.4% on Microsoft's 7-Scenes datasets, which verifies the effectiveness of the proposed algorithm.\",\"PeriodicalId\":395380,\"journal\":{\"name\":\"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCV.2018.8581204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCV.2018.8581204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
重新定位是移动机器人在环境中长期自主运动的必要模块之一。目前,视觉再定位算法主要包括基于特征的方法和基于卷积神经网络的方法。在特征丰富的场景中,基于特征的方法可以实现较高的定位精度,但在运动模糊、场景无纹理、视角变化等情况下,定位误差较大甚至失败。基于cnn的方法通常具有较好的鲁棒性,但定位精度较差。为此,本文提出了一种结合两种方法优点的视觉再定位算法。使用BoVW (Bag of Visual Words)模型在训练数据集中搜索最相似的图像。采用PnP (Perspective n Points)和RANSAC (Random Sample Consensus)来估计初始姿态。然后将内层数作为使用基于特征的方法还是基于cnn的方法的标准。与先前基于cnn的PoseNet方法相比,在微软7- scene数据集上,平均位置误差降低了45.6%,平均方向误差降低了67.4%,验证了算法的有效性。
Feature-Based and Convolutional Neural Network Fusion Method for Visual Relocalization
Relocalization is one of the necessary modules for mobile robots in long-term autonomous movement in an environment. Currently, visual relocalization algorithms mainly include feature-based methods and CNN-based (Convolutional Neural Network) methods. Feature-based methods can achieve high localization accuracy in feature-rich scenes, but the error is quite large or it even fails in cases with motion blur, texture-less scene and changing view angle. CNN-based methods usually have better robustness but poor localization accuracy. For this reason, a visual relocalization algorithm that combines the advantages of the two methods is proposed in this paper. The BoVW (Bag of Visual Words) model is used to search for the most similar image in the training dataset. PnP (Perspective n Points) and RANSAC (Random Sample Consensus) are employed to estimate an initial pose. Then the number of inliers is utilized as a criterion whether the feature-based method or the CNN-based method is to be leveraged. Compared with a previous CNN-based method, PoseNet, the average position error is reduced by 45.6% and the average orientation error is reduced by 67.4% on Microsoft's 7-Scenes datasets, which verifies the effectiveness of the proposed algorithm.