Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning

2023 IEEE Conference on Artificial Intelligence (CAI) Pub Date : 2023-06-01 DOI:10.1109/CAI54212.2023.00035

Bryan Pedraza, Dimah Dera

{"title":"Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning","authors":"Bryan Pedraza, Dimah Dera","doi":"10.1109/CAI54212.2023.00035","DOIUrl":null,"url":null,"abstract":"Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.

查看原文本刊更多论文

基于贝叶斯Actor-Critic强化学习的鲁棒主动同步定位与映射

自主移动机器人在商业、工业、制造业、电子商务和医疗保健领域发挥着至关重要的作用。自主导航和避障涉及机器人在没有先验知识的情况下自主探索和绘制未知环境的定位。同时定位与制图(SLAM)提出了严峻的挑战。本文提出了一种基于贝叶斯Actor-Critic (A2C)强化学习的鲁棒导航和机器人动作映射新方法。Actor-Critic的原理结合了基于策略和基于价值的学习，将模型分为两个部分:策略模型(Actor)根据状态计算行动，价值模型(Critic)跟踪代理在游戏过程中是领先还是落后。这些反馈可以指导训练过程，在这个过程中，两个模型都参与到游戏中，并随着时间的推移优化它们的输出。我们开发了一个贝叶斯A2C模型，该模型生成机器人动作并量化动作的不确定性，以实现鲁棒探索和无碰撞导航。我们采用贝叶斯推理，利用证据下界(ELBO)目标优化未知模型参数的变分后验分布。一阶泰勒级数近似于A2C模型中通过非线性函数传递的变分分布的均值和协方差。传播协方差估计机器人在actor网络输出端的动作不确定性。实验证明，与确定性同源物相比，所提出的贝叶斯A2C模型在探索重噪声环境时具有优越的鲁棒性。所提出的框架可以应用于其他研究领域(水下机器人，生物医学设备/机器人，微型机器人，无人机等)，其中鲁棒性和不确定性量化至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE Conference on Artificial Intelligence (CAI)

自引率

0.00%

发文量