Masaya Kaneko, Kazuya Iwami, Toru Ogawa, T. Yamasaki, K. Aizawa
{"title":"Mask-SLAM:基于语义分割掩蔽的鲁棒特征单目SLAM","authors":"Masaya Kaneko, Kazuya Iwami, Toru Ogawa, T. Yamasaki, K. Aizawa","doi":"10.1109/CVPRW.2018.00063","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) and deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. In conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those feature points. However, if a major portion of the view is occupied by moving objects, many feature points become inappropriate and RANSAC does not perform well. Based on our empirical studies, feature points in the sky and on cars often cause errors in vSLAM. We propose a new framework to exclude feature points using a mask produced by semantic segmentation. Excluding feature points in masked areas enables vSLAM to stably estimate camera motion. We apply ORB-SLAM [15] in our framework, which is a state-of-the-art implementation of monocular vSLAM. For our experiments, we created vSLAM evaluation datasets by using the CARLA simulator [3] under various conditions. Compared to state-of-the-art methods, our method can achieve significantly higher accuracy.","PeriodicalId":150600,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":"{\"title\":\"Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation\",\"authors\":\"Masaya Kaneko, Kazuya Iwami, Toru Ogawa, T. Yamasaki, K. Aizawa\",\"doi\":\"10.1109/CVPRW.2018.00063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) and deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. In conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those feature points. However, if a major portion of the view is occupied by moving objects, many feature points become inappropriate and RANSAC does not perform well. Based on our empirical studies, feature points in the sky and on cars often cause errors in vSLAM. We propose a new framework to exclude feature points using a mask produced by semantic segmentation. Excluding feature points in masked areas enables vSLAM to stably estimate camera motion. We apply ORB-SLAM [15] in our framework, which is a state-of-the-art implementation of monocular vSLAM. For our experiments, we created vSLAM evaluation datasets by using the CARLA simulator [3] under various conditions. Compared to state-of-the-art methods, our method can achieve significantly higher accuracy.\",\"PeriodicalId\":150600,\"journal\":{\"name\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"62\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPRW.2018.00063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2018.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mask-SLAM: Robust Feature-Based Monocular SLAM by Masking Using Semantic Segmentation
In this paper, we propose a novel method that combines monocular visual simultaneous localization and mapping (vSLAM) and deep-learning-based semantic segmentation. For stable operation, vSLAM requires feature points on static objects. In conventional vSLAM, random sample consensus (RANSAC) [5] is used to select those feature points. However, if a major portion of the view is occupied by moving objects, many feature points become inappropriate and RANSAC does not perform well. Based on our empirical studies, feature points in the sky and on cars often cause errors in vSLAM. We propose a new framework to exclude feature points using a mask produced by semantic segmentation. Excluding feature points in masked areas enables vSLAM to stably estimate camera motion. We apply ORB-SLAM [15] in our framework, which is a state-of-the-art implementation of monocular vSLAM. For our experiments, we created vSLAM evaluation datasets by using the CARLA simulator [3] under various conditions. Compared to state-of-the-art methods, our method can achieve significantly higher accuracy.