Context-Generative Default Policy for Bounded Rational Agent

Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu
{"title":"Context-Generative Default Policy for Bounded Rational Agent","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":null,"url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\nof choices, typically derived from a reference point termed the $`$default\npolicy,' based on previous experience. However, the inherent rigidity of the\nstatic default policy presents significant challenges for agents when operating\nin unknown environment, that are not included in agent's prior knowledge. In\nthis work, we introduce a context-generative default policy that leverages the\nregion observed by the robot to predict unobserved part of the environment,\nthereby enabling the robot to adaptively adjust its default policy based on\nboth the actual observed map and the $\\textit{imagined}$ unobserved map.\nFurthermore, the adaptive nature of the bounded rationality framework enables\nthe robot to manage unreliable or incorrect imaginations by selectively\nsampling a few trajectories in the vicinity of the default policy. Our approach\nutilizes a diffusion model for map prediction and a sampling-based planning\nwith B-spline trajectory optimization to generate the default policy. Extensive\nevaluations reveal that the context-generative policy outperforms the baseline\nmethods in identifying and avoiding unseen obstacles. Additionally, real-world\nexperiments conducted with the Crazyflie drones demonstrate the adaptability of\nour proposed method, even when acting in environments outside the domain of the\ntraining distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
有界理性代理的情境生成默认策略
有界理性代理通常通过评估有限的选择来做出决策,这些选择通常来自于一个被称为"$$默认政策 "的参考点,该参考点基于以往的经验。然而,当代理在未知环境中工作时,静态默认政策的固有刚性给代理带来了巨大挑战,而这些环境并不包括在代理的先验知识中。在这项工作中,我们引入了一种情境生成默认策略,它利用机器人观察到的区域来预测环境中未观察到的部分,从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外,有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测,并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明,情境生成策略在识别和避开未知障碍物方面优于基准方法。此外,使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性,即使在训练分布领域之外的环境中也是如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信