{"title":"Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning","authors":"Bryan Pedraza, Dimah Dera","doi":"10.1109/CAI54212.2023.00035","DOIUrl":null,"url":null,"abstract":"Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.