利用深度逆强化学习学习隐性社会导航行为

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-04-02 DOI:10.1109/LRA.2025.3557299

Tribhi Kathuria;Ke Liu;Junwoo Jang;X. Jessie Yang;Maani Ghaffari

{"title":"利用深度逆强化学习学习隐性社会导航行为","authors":"Tribhi Kathuria;Ke Liu;Junwoo Jang;X. Jessie Yang;Maani Ghaffari","doi":"10.1109/LRA.2025.3557299","DOIUrl":null,"url":null,"abstract":"This paper reports on learning a reward map for social navigation in dynamic environments where the robot can reason about its path at any time, given agent trajectories and scene geometry. Humans navigating in dense and dynamic indoor environments often work with several implied social rules. A rule-based approach fails to model all possible interactions between humans, robots, and scenes. We propose a novel Smooth Maximum Entropy Deep Inverse Reinforcement Learning (S-MEDIRL) algorithm that can extrapolate beyond expert demos to better encode scene navigability from few-shot demonstrations. The agent learns to predict the cost maps based on trajectory data as well as scene geometry. The trajectory sampled from the learned cost map is then executed using a local crowd navigation controller. We present results in a photo-realistic simulation environment, with a robot and a human navigating a narrow crossing scenario. The robot implicitly learns to exhibit social behaviors such as yielding to oncoming traffic and avoiding deadlocks. We compare the proposed approach to the popular model-based crowd navigation algorithm ORCA and a rule-based agent that exhibits yielding.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"5146-5153"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Implicit Social Navigation Behavior Using Deep Inverse Reinforcement Learning\",\"authors\":\"Tribhi Kathuria;Ke Liu;Junwoo Jang;X. Jessie Yang;Maani Ghaffari\",\"doi\":\"10.1109/LRA.2025.3557299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper reports on learning a reward map for social navigation in dynamic environments where the robot can reason about its path at any time, given agent trajectories and scene geometry. Humans navigating in dense and dynamic indoor environments often work with several implied social rules. A rule-based approach fails to model all possible interactions between humans, robots, and scenes. We propose a novel Smooth Maximum Entropy Deep Inverse Reinforcement Learning (S-MEDIRL) algorithm that can extrapolate beyond expert demos to better encode scene navigability from few-shot demonstrations. The agent learns to predict the cost maps based on trajectory data as well as scene geometry. The trajectory sampled from the learned cost map is then executed using a local crowd navigation controller. We present results in a photo-realistic simulation environment, with a robot and a human navigating a narrow crossing scenario. The robot implicitly learns to exhibit social behaviors such as yielding to oncoming traffic and avoiding deadlocks. We compare the proposed approach to the popular model-based crowd navigation algorithm ORCA and a rule-based agent that exhibits yielding.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 5\",\"pages\":\"5146-5153\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10947583/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10947583/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

本文报道了在动态环境中学习一种用于社交导航的奖励地图，在这种环境中，机器人可以在给定智能体轨迹和场景几何的情况下随时推断其路径。在密集和动态的室内环境中导航的人类通常会遵循一些隐含的社会规则。基于规则的方法无法对人、机器人和场景之间所有可能的交互进行建模。我们提出了一种新颖的平滑最大熵深度逆强化学习（S-MEDIRL）算法，该算法可以在专家演示之外进行外推，从而更好地从少量演示中编码场景可导航性。智能体学习基于轨迹数据和场景几何来预测成本图。然后使用本地人群导航控制器执行从学习成本图中采样的轨迹。我们在一个逼真的模拟环境中展示了结果，一个机器人和一个人在狭窄的交叉场景中导航。机器人会隐性地学习表现出社交行为，比如向迎面而来的车辆让步，避免出现交通堵塞。我们将提出的方法与流行的基于模型的人群导航算法ORCA和基于规则的智能体进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Implicit Social Navigation Behavior Using Deep Inverse Reinforcement Learning

This paper reports on learning a reward map for social navigation in dynamic environments where the robot can reason about its path at any time, given agent trajectories and scene geometry. Humans navigating in dense and dynamic indoor environments often work with several implied social rules. A rule-based approach fails to model all possible interactions between humans, robots, and scenes. We propose a novel Smooth Maximum Entropy Deep Inverse Reinforcement Learning (S-MEDIRL) algorithm that can extrapolate beyond expert demos to better encode scene navigability from few-shot demonstrations. The agent learns to predict the cost maps based on trajectory data as well as scene geometry. The trajectory sampled from the learned cost map is then executed using a local crowd navigation controller. We present results in a photo-realistic simulation environment, with a robot and a human navigating a narrow crossing scenario. The robot implicitly learns to exhibit social behaviors such as yielding to oncoming traffic and avoiding deadlocks. We compare the proposed approach to the popular model-based crowd navigation algorithm ORCA and a rule-based agent that exhibits yielding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.