Context-Generative Default Policy for Bounded Rational Agent

arXiv - CS - Robotics Pub Date : 2024-09-17 DOI:arxiv-2409.11604

Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu

{"title":"Context-Generative Default Policy for Bounded Rational Agent","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":null,"url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\nof choices, typically derived from a reference point termed the $`$default\npolicy,' based on previous experience. However, the inherent rigidity of the\nstatic default policy presents significant challenges for agents when operating\nin unknown environment, that are not included in agent's prior knowledge. In\nthis work, we introduce a context-generative default policy that leverages the\nregion observed by the robot to predict unobserved part of the environment,\nthereby enabling the robot to adaptively adjust its default policy based on\nboth the actual observed map and the $\\textit{imagined}$ unobserved map.\nFurthermore, the adaptive nature of the bounded rationality framework enables\nthe robot to manage unreliable or incorrect imaginations by selectively\nsampling a few trajectories in the vicinity of the default policy. Our approach\nutilizes a diffusion model for map prediction and a sampling-based planning\nwith B-spline trajectory optimization to generate the default policy. Extensive\nevaluations reveal that the context-generative policy outperforms the baseline\nmethods in identifying and avoiding unseen obstacles. Additionally, real-world\nexperiments conducted with the Crazyflie drones demonstrate the adaptability of\nour proposed method, even when acting in environments outside the domain of the\ntraining distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.

查看原文本刊更多论文

有界理性代理的情境生成默认策略

有界理性代理通常通过评估有限的选择来做出决策，这些选择通常来自于一个被称为"$$默认政策 "的参考点，该参考点基于以往的经验。然而，当代理在未知环境中工作时，静态默认政策的固有刚性给代理带来了巨大挑战，而这些环境并不包括在代理的先验知识中。在这项工作中，我们引入了一种情境生成默认策略，它利用机器人观察到的区域来预测环境中未观察到的部分，从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外，有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测，并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明，情境生成策略在识别和避开未知障碍物方面优于基准方法。此外，使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性，即使在训练分布领域之外的环境中也是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量