{"title":"一种融合视觉和异方差运动估计的深度神经网络方法,用于低swap机器人应用","authors":"Jared Shamwell, W. Nothwang, D. Perlis","doi":"10.1109/MFI.2017.8170407","DOIUrl":null,"url":null,"abstract":"Due both to the speed and quality of their sensors and restrictive on-board computational capabilities, current state-of-the-art (SOA) size, weight, and power (SWaP) constrained autonomous robotic systems are limited in their abilities to sample, fuse, and analyze sensory data for state estimation. Aimed at improving SWaP-constrained robotic state estimation, we present Multi-Hypothesis DeepEfference (MHDE) — an unsupervised, deep convolutional-deconvolutional sensor fusion network that learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. This new multi-hypothesis formulation of our previous architecture, DeepEfference [1], has been augmented to handle dynamic heteroscedastic sensor and motion noise and computes hypothesis image mappings and predictions at 150–400 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel architectural pathways and n (1, 2, 4, or 8 in this work) multi-hypothesis generation subpathways to generate n pixel-level predictions and correspondences between source and target images. We evaluated MHDE on the KITTI Odometry dataset [2] and benchmarked it against DeepEfference [1] and DeepMatching [3] by mean pixel error and runtime. MHDE with 8 hypotheses outperformed DeepEfference in root mean squared (RMSE) pixel error by 103% in the maximum heteroscedastic noise condition and by 18% in the noise-free condition. MHDE with 8 hypotheses was over 5, 000% faster than DeepMatching with only a 3% increase in RMSE.","PeriodicalId":402371,"journal":{"name":"2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A deep neural network approach to fusing vision and heteroscedastic motion estimates for low-SWaP robotic applications\",\"authors\":\"Jared Shamwell, W. Nothwang, D. Perlis\",\"doi\":\"10.1109/MFI.2017.8170407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due both to the speed and quality of their sensors and restrictive on-board computational capabilities, current state-of-the-art (SOA) size, weight, and power (SWaP) constrained autonomous robotic systems are limited in their abilities to sample, fuse, and analyze sensory data for state estimation. Aimed at improving SWaP-constrained robotic state estimation, we present Multi-Hypothesis DeepEfference (MHDE) — an unsupervised, deep convolutional-deconvolutional sensor fusion network that learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. This new multi-hypothesis formulation of our previous architecture, DeepEfference [1], has been augmented to handle dynamic heteroscedastic sensor and motion noise and computes hypothesis image mappings and predictions at 150–400 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel architectural pathways and n (1, 2, 4, or 8 in this work) multi-hypothesis generation subpathways to generate n pixel-level predictions and correspondences between source and target images. We evaluated MHDE on the KITTI Odometry dataset [2] and benchmarked it against DeepEfference [1] and DeepMatching [3] by mean pixel error and runtime. MHDE with 8 hypotheses outperformed DeepEfference in root mean squared (RMSE) pixel error by 103% in the maximum heteroscedastic noise condition and by 18% in the noise-free condition. MHDE with 8 hypotheses was over 5, 000% faster than DeepMatching with only a 3% increase in RMSE.\",\"PeriodicalId\":402371,\"journal\":{\"name\":\"2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MFI.2017.8170407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MFI.2017.8170407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A deep neural network approach to fusing vision and heteroscedastic motion estimates for low-SWaP robotic applications
Due both to the speed and quality of their sensors and restrictive on-board computational capabilities, current state-of-the-art (SOA) size, weight, and power (SWaP) constrained autonomous robotic systems are limited in their abilities to sample, fuse, and analyze sensory data for state estimation. Aimed at improving SWaP-constrained robotic state estimation, we present Multi-Hypothesis DeepEfference (MHDE) — an unsupervised, deep convolutional-deconvolutional sensor fusion network that learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. This new multi-hypothesis formulation of our previous architecture, DeepEfference [1], has been augmented to handle dynamic heteroscedastic sensor and motion noise and computes hypothesis image mappings and predictions at 150–400 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel architectural pathways and n (1, 2, 4, or 8 in this work) multi-hypothesis generation subpathways to generate n pixel-level predictions and correspondences between source and target images. We evaluated MHDE on the KITTI Odometry dataset [2] and benchmarked it against DeepEfference [1] and DeepMatching [3] by mean pixel error and runtime. MHDE with 8 hypotheses outperformed DeepEfference in root mean squared (RMSE) pixel error by 103% in the maximum heteroscedastic noise condition and by 18% in the noise-free condition. MHDE with 8 hypotheses was over 5, 000% faster than DeepMatching with only a 3% increase in RMSE.