Trustworthy navigation with variational policy in deep reinforcement learning.

IF 3 Q2 ROBOTICS
Frontiers in Robotics and AI Pub Date : 2025-10-08 eCollection Date: 2025-01-01 DOI:10.3389/frobt.2025.1652050
Karla Bockrath, Liam Ernst, Rohaan Nadeem, Bryan Pedraza, Dimah Dera
{"title":"Trustworthy navigation with variational policy in deep reinforcement learning.","authors":"Karla Bockrath, Liam Ernst, Rohaan Nadeem, Bryan Pedraza, Dimah Dera","doi":"10.3389/frobt.2025.1652050","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Developing a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Particularly, exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots.</p><p><strong>Methods: </strong>This study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot's action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network's parameters. Policy-based and value-based learning are combined to guide the robot's actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot's localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot's cumulative reward in an unknown environment.</p><p><strong>Results: </strong>Experiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping.</p><p><strong>Discussion: </strong>Trust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks. This integration of uncertainty into the policy network promotes safer and more reliable navigation, especially in complex or unpredictable environments. Trust-Nav offers a step toward deployable, self-aware robotic systems capable of recognizing and responding to their own limitations.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1652050"},"PeriodicalIF":3.0000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12541417/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2025.1652050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Developing a reliable and trustworthy navigation policy in deep reinforcement learning (DRL) for mobile robots is extremely challenging, particularly in real-world, highly dynamic environments. Particularly, exploring and navigating unknown environments without prior knowledge, while avoiding obstacles and collisions, is very cumbersome for mobile robots.

Methods: This study introduces a novel trustworthy navigation framework that utilizes variational policy learning to quantify uncertainty in the estimation of the robot's action, localization, and map representation. Trust-Nav employs the Bayesian variational approximation of the posterior distribution over the policy-based neural network's parameters. Policy-based and value-based learning are combined to guide the robot's actions in unknown environments. We derive the propagation of variational moments through all layers of the policy network and employ a first-order approximation for the nonlinear activation functions. The uncertainty in robot action is measured by the propagated variational covariance in the DRL policy network. At the same time, the uncertainty in the robot's localization and mapping is embedded in the reward function and stems from the traditional Theory of Optimal Experimental Design. The total loss function optimizes the parameters of the policy and value networks to maximize the robot's cumulative reward in an unknown environment.

Results: Experiments conducted using the Gazebo robotics simulator demonstrate the superior performance of the proposed Trust-Nav model in achieving robust autonomous navigation and mapping.

Discussion: Trust-Nav consistently outperforms deterministic DRL approaches, particularly in complicated environments involving noisy conditions and adversarial attacks. This integration of uncertainty into the policy network promotes safer and more reliable navigation, especially in complex or unpredictable environments. Trust-Nav offers a step toward deployable, self-aware robotic systems capable of recognizing and responding to their own limitations.

深度强化学习中具有变分策略的可信导航。
在移动机器人的深度强化学习(DRL)中开发可靠且值得信赖的导航策略是极具挑战性的,特别是在现实世界中,高度动态的环境中。特别是,在没有先验知识的情况下探索和导航未知环境,同时避免障碍物和碰撞,对于移动机器人来说是非常麻烦的。方法:本研究引入了一种新的可信导航框架,该框架利用变分策略学习来量化机器人动作、定位和地图表示估计中的不确定性。Trust-Nav采用基于策略的神经网络参数的后验分布的贝叶斯变分近似。结合基于策略和基于价值的学习来指导机器人在未知环境中的行动。我们推导了变分矩在策略网络各层中的传播,并对非线性激活函数采用一阶近似。机器人动作的不确定性通过DRL策略网络中的传播变分协方差来度量。同时,机器人定位和映射的不确定性嵌入在奖励函数中,源于传统的最优实验设计理论。总损失函数对策略网络和价值网络的参数进行优化,使机器人在未知环境下的累计奖励最大化。结果:使用Gazebo机器人模拟器进行的实验证明了所提出的Trust-Nav模型在实现鲁棒自主导航和映射方面的优越性能。讨论:Trust-Nav始终优于确定性DRL方法,特别是在涉及噪声条件和对抗性攻击的复杂环境中。这种将不确定性纳入政策网络的做法促进了更安全、更可靠的导航,尤其是在复杂或不可预测的环境中。Trust-Nav向可部署的、具有自我意识的机器人系统迈出了一步,该系统能够识别并响应自身的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.50
自引率
5.90%
发文量
355
审稿时长
14 weeks
期刊介绍: Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信