SNAP:Successor Entropy based Incremental Subgoal Discovery for Adaptive Navigation

Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games Pub Date : 2021-11-10 DOI:10.1145/3487983.3488292

R. Dubey, Samuel S. Sohn, J. Abualdenien, Tyler Thrash, C. Hoelscher, A. Borrmann, Mubbasir Kapadia

{"title":"SNAP:Successor Entropy based Incremental Subgoal Discovery for Adaptive Navigation","authors":"R. Dubey, Samuel S. Sohn, J. Abualdenien, Tyler Thrash, C. Hoelscher, A. Borrmann, Mubbasir Kapadia","doi":"10.1145/3487983.3488292","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has demonstrated great success in solving navigation tasks but often fails when learning complex environmental structures. One open challenge is to incorporate low-level generalizable skills with human-like adaptive path-planning in an RL framework. Motivated by neural findings in animal navigation, we propose a Successor eNtropy-based Adaptive Path-planning (SNAP) that combines a low-level goal-conditioned policy with the flexibility of a classical high-level planner. SNAP decomposes distant goal-reaching tasks into multiple nearby goal-reaching sub-tasks using a topological graph. To construct this graph, we propose an incremental subgoal discovery method that leverages the highest-entropy states in the learned Successor Representation. The Successor Representation encodes the likelihood of being in a future state given the current state and capture the relational structure of states based on a policy. Our main contributions lie in discovering subgoal states that efficiently abstract the state-space and proposing a low-level goal-conditioned controller for local navigation. Since the basic low-level skill is learned independent of state representation, our model easily generalizes to novel environments without intensive relearning. We provide empirical evidence that the proposed method enables agents to perform long-horizon sparse reward tasks quickly, take detours during barrier tasks, and exploit shortcuts that did not exist during training. Our experiments further show that the proposed method outperforms the existing goal-conditioned RL algorithms in successfully reaching distant-goal tasks and policy learning. To evaluate human-like adaptive path-planning, we also compare our optimal agent with human data and found that, on average, the agent was able to find a shorter path than the human participants.","PeriodicalId":170509,"journal":{"name":"Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487983.3488292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Reinforcement learning (RL) has demonstrated great success in solving navigation tasks but often fails when learning complex environmental structures. One open challenge is to incorporate low-level generalizable skills with human-like adaptive path-planning in an RL framework. Motivated by neural findings in animal navigation, we propose a Successor eNtropy-based Adaptive Path-planning (SNAP) that combines a low-level goal-conditioned policy with the flexibility of a classical high-level planner. SNAP decomposes distant goal-reaching tasks into multiple nearby goal-reaching sub-tasks using a topological graph. To construct this graph, we propose an incremental subgoal discovery method that leverages the highest-entropy states in the learned Successor Representation. The Successor Representation encodes the likelihood of being in a future state given the current state and capture the relational structure of states based on a policy. Our main contributions lie in discovering subgoal states that efficiently abstract the state-space and proposing a low-level goal-conditioned controller for local navigation. Since the basic low-level skill is learned independent of state representation, our model easily generalizes to novel environments without intensive relearning. We provide empirical evidence that the proposed method enables agents to perform long-horizon sparse reward tasks quickly, take detours during barrier tasks, and exploit shortcuts that did not exist during training. Our experiments further show that the proposed method outperforms the existing goal-conditioned RL algorithms in successfully reaching distant-goal tasks and policy learning. To evaluate human-like adaptive path-planning, we also compare our optimal agent with human data and found that, on average, the agent was able to find a shorter path than the human participants.

查看原文本刊更多论文

SNAP:基于后继熵的自适应导航增量子目标发现

强化学习(RL)在解决导航任务方面取得了巨大的成功，但在学习复杂的环境结构时往往失败。一个开放的挑战是在RL框架中结合低级的一般化技能和类似人类的自适应路径规划。基于动物导航中的神经学发现，我们提出了一种基于后继熵的自适应路径规划(SNAP)方法，该方法将低级目标条件策略与经典高级规划器的灵活性相结合。SNAP利用拓扑图将远距离的目标实现任务分解为多个近距离的目标实现子任务。为了构建此图，我们提出了一种增量子目标发现方法，该方法利用学习到的后继表示中的最高熵状态。后继表示根据当前状态对处于未来状态的可能性进行编码，并根据策略捕获状态的关系结构。我们的主要贡献在于发现了有效抽象状态空间的子目标状态，并提出了用于局部导航的低级目标条件控制器。由于基本的低级技能的学习独立于状态表示，我们的模型很容易推广到新的环境，而不需要密集的再学习。我们提供的经验证据表明，所提出的方法使智能体能够快速执行长视界稀疏奖励任务，在障碍任务中绕路，并利用训练过程中不存在的捷径。我们的实验进一步表明，所提出的方法在成功达到远程目标任务和策略学习方面优于现有的目标条件强化学习算法。为了评估类人自适应路径规划，我们还将我们的最优智能体与人类数据进行了比较，发现平均而言，智能体能够找到比人类参与者更短的路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games

自引率

0.00%

发文量