Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning

Bryan Pedraza, Dimah Dera
{"title":"Robust Active Simultaneous Localization and Mapping Based on Bayesian Actor-Critic Reinforcement Learning","authors":"Bryan Pedraza, Dimah Dera","doi":"10.1109/CAI54212.2023.00035","DOIUrl":null,"url":null,"abstract":"Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Autonomous mobile robots play vital roles in business, industry, manufacturing, e-commerce, and healthcare. Autonomous navigation and obstacle avoidance involve localizing a robot to actively explore and map an unknown environment autonomously without prior knowledge. Simultaneous localization and mapping (SLAM) present a severe challenge. This paper proposes a novel approach for robust navigation and robot action mapping based on Bayesian Actor-Critic (A2C) reinforcement learning. The principle of Actor-Critic combines policy-based and value-based learning by splitting the model into two: the policy model (Actor) computes the action based on the state, and the value model (Critic) tracks whether the agent is ahead or behind during the game. That feedback guides the training process, where both models participate in a game and optimize their output as time passes. We develop a Bayesian A2C model that generates robot actions and quantifies uncertainty on the actions toward robust exploration and collision-free navigation. We adopt the Bayesian inference and optimize the variational posterior distribution over the unknown model parameters using the evidence lower bound (ELBO) objective. The first-order Taylor series approximates the mean and covariance of the variational distribution passed through non-linear functions in the A2C model. The propagated covariance estimates the robot's action uncertainty at the output of the Actor-network. Experiments demonstrate the superior robustness of the proposed Bayesian A2C model exploring heavily noisy environments compared to deterministic homologs. The proposed framework can be applied to other fields of research (underwater robots, biomedical devices/robots, micro-robots, drones, etc.) where robustness and uncertainty quantification are critical.
基于贝叶斯Actor-Critic强化学习的鲁棒主动同步定位与映射
自主移动机器人在商业、工业、制造业、电子商务和医疗保健领域发挥着至关重要的作用。自主导航和避障涉及机器人在没有先验知识的情况下自主探索和绘制未知环境的定位。同时定位与制图(SLAM)提出了严峻的挑战。本文提出了一种基于贝叶斯Actor-Critic (A2C)强化学习的鲁棒导航和机器人动作映射新方法。Actor-Critic的原理结合了基于策略和基于价值的学习,将模型分为两个部分:策略模型(Actor)根据状态计算行动,价值模型(Critic)跟踪代理在游戏过程中是领先还是落后。这些反馈可以指导训练过程,在这个过程中,两个模型都参与到游戏中,并随着时间的推移优化它们的输出。我们开发了一个贝叶斯A2C模型,该模型生成机器人动作并量化动作的不确定性,以实现鲁棒探索和无碰撞导航。我们采用贝叶斯推理,利用证据下界(ELBO)目标优化未知模型参数的变分后验分布。一阶泰勒级数近似于A2C模型中通过非线性函数传递的变分分布的均值和协方差。传播协方差估计机器人在actor网络输出端的动作不确定性。实验证明,与确定性同源物相比,所提出的贝叶斯A2C模型在探索重噪声环境时具有优越的鲁棒性。所提出的框架可以应用于其他研究领域(水下机器人,生物医学设备/机器人,微型机器人,无人机等),其中鲁棒性和不确定性量化至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信