{"title":"A Novel End-to-End Visual Odometry Framework Based on Deep Neural Network","authors":"Yinan Wang, Rongchuan Cao, Yingzhou Guan, Xiaoli Zhang","doi":"10.1109/ICCEAI55464.2022.00115","DOIUrl":null,"url":null,"abstract":"Most of the current visual odometry (VO) methods are designed based on a set of standard procedures, including camera calibration, feature extraction, feature matching (or tracking), motion estimation, local optimization, etc. Though some of these methods excel in accuracy and robustness, they usually require careful design and specific fine-tuning to ensure performance in different scenarios. Also, recovering the absolute scale of the monocular VO usually requires some prior knowledge. In this paper, we propose an end-to-end framework based on U-Net and deep recurrent neural networks (RNNs) for training and deployment, which directly estimates poses from a sequence of raw RGB images without using any modules of the conventional VO system and thus without fine-tuning the parameters of the VO system. Also, no prior knowledge of the scene is necessary in our method. To solve the problem that current deep learning methods only predict poses between frames, we first use U-Net to automatically learn effective feature representations for VO, and then utilize RNN to implicitly model sequence dynamics and relationships. This method makes full use of the context information of sequence frames to achieve accurate and robust VO localization. Experimental results show the superiority of our method compared with several widely used VO methods.","PeriodicalId":414181,"journal":{"name":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI55464.2022.00115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Most of the current visual odometry (VO) methods are designed based on a set of standard procedures, including camera calibration, feature extraction, feature matching (or tracking), motion estimation, local optimization, etc. Though some of these methods excel in accuracy and robustness, they usually require careful design and specific fine-tuning to ensure performance in different scenarios. Also, recovering the absolute scale of the monocular VO usually requires some prior knowledge. In this paper, we propose an end-to-end framework based on U-Net and deep recurrent neural networks (RNNs) for training and deployment, which directly estimates poses from a sequence of raw RGB images without using any modules of the conventional VO system and thus without fine-tuning the parameters of the VO system. Also, no prior knowledge of the scene is necessary in our method. To solve the problem that current deep learning methods only predict poses between frames, we first use U-Net to automatically learn effective feature representations for VO, and then utilize RNN to implicitly model sequence dynamics and relationships. This method makes full use of the context information of sequence frames to achieve accurate and robust VO localization. Experimental results show the superiority of our method compared with several widely used VO methods.