Attention-Based Value Classification Reinforcement Learning for Collision-Free Robot Navigation

IF 14 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-04-18 DOI:10.1109/TIV.2024.3391007

Chao Sun;Xing Wu;Yuanda Wang;Changyin Sun

{"title":"Attention-Based Value Classification Reinforcement Learning for Collision-Free Robot Navigation","authors":"Chao Sun;Xing Wu;Yuanda Wang;Changyin Sun","doi":"10.1109/TIV.2024.3391007","DOIUrl":null,"url":null,"abstract":"Collision avoidance is a crucial technique to achieve safe and efficient robotic vehicle navigation in unknown environments. However, moving obstacles with unpredictability in dynamic scenarios, usually increase the difficulty and complexity in collision avoidance of robotic vehicles. To enhance the stability of collision avoidance and boost its adaptability to uncertain dynamic scenes, a new attention-based value classification actor-critic (AVCAC) architecture is proposed. It is an end-to-end robot navigation model that utilizes imperfect local observation to directly plan accurate collision-free motion commands. First, we design a value-classified rollout replaybuffer to categorize the experiences into different pools. It can prevent any overfitting or bias that may result from repeatedly sampling experiences of a certain type during policy learning. Then, we improve the conventional actor-critic network with a multi-head local attention module to extract the local observations at entity-level. This way, the collision avoidance system can focus on key environmental features to operate more efficiently and respond more swiftly to dynamic changes in the environment. Moreover, a lookahead multi-step prediction (LMP) reward setting is devised in the AVCAC-based reinforcement learning (RL) framework to facilitate more informed and forward-looking decision-making. Finally, the policy entropy (PE) and policy delay (PD) are extended to AVCAC model to enhance policy exploration and make policy more robust. Extensive experimental results reveal that our method can generate time-efficient and collision-free guide paths to dodge collisions under complex dynamic environments.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"9 11","pages":"6898-6911"},"PeriodicalIF":14.0000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10504956/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Collision avoidance is a crucial technique to achieve safe and efficient robotic vehicle navigation in unknown environments. However, moving obstacles with unpredictability in dynamic scenarios, usually increase the difficulty and complexity in collision avoidance of robotic vehicles. To enhance the stability of collision avoidance and boost its adaptability to uncertain dynamic scenes, a new attention-based value classification actor-critic (AVCAC) architecture is proposed. It is an end-to-end robot navigation model that utilizes imperfect local observation to directly plan accurate collision-free motion commands. First, we design a value-classified rollout replaybuffer to categorize the experiences into different pools. It can prevent any overfitting or bias that may result from repeatedly sampling experiences of a certain type during policy learning. Then, we improve the conventional actor-critic network with a multi-head local attention module to extract the local observations at entity-level. This way, the collision avoidance system can focus on key environmental features to operate more efficiently and respond more swiftly to dynamic changes in the environment. Moreover, a lookahead multi-step prediction (LMP) reward setting is devised in the AVCAC-based reinforcement learning (RL) framework to facilitate more informed and forward-looking decision-making. Finally, the policy entropy (PE) and policy delay (PD) are extended to AVCAC model to enhance policy exploration and make policy more robust. Extensive experimental results reveal that our method can generate time-efficient and collision-free guide paths to dodge collisions under complex dynamic environments.

查看原文本刊更多论文

基于注意力的无碰撞机器人导航值分类强化学习

避碰是实现未知环境下机器人车辆安全、高效导航的关键技术。然而，由于移动障碍物在动态场景中具有不可预测性，通常会增加机器人车辆避碰的难度和复杂性。为了提高避撞系统的稳定性和对不确定动态场景的适应能力，提出了一种新的基于注意力的价值分类行为评价（AVCAC）体系结构。它是一种端到端的机器人导航模型，利用不完善的局部观测直接规划精确的无碰撞运动命令。首先，我们设计了一个价值分类的rolloutreplaybuffer，将体验分类到不同的池中。它可以防止在政策学习过程中由于反复采样某一类型的经验而导致的过拟合或偏差。然后，我们利用多头局部注意模块对传统的行为者评论网络进行改进，在实体层面提取局部观察结果。这样，避碰系统就可以专注于关键的环境特征，从而更有效地运行，并对环境的动态变化做出更迅速的反应。此外，在基于avca的强化学习（RL）框架中设计了前瞻性多步预测（LMP）奖励设置，以促进更明智和前瞻性的决策。最后，将策略熵（PE）和策略延迟（PD）扩展到AVCAC模型中，增强了策略的探索能力，增强了策略的鲁棒性。大量的实验结果表明，在复杂的动态环境下，该方法可以生成时间高效、无碰撞的避碰路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.