{"title":"Attention-Based Value Classification Reinforcement Learning for Collision-Free Robot Navigation","authors":"Chao Sun;Xing Wu;Yuanda Wang;Changyin Sun","doi":"10.1109/TIV.2024.3391007","DOIUrl":null,"url":null,"abstract":"Collision avoidance is a crucial technique to achieve safe and efficient robotic vehicle navigation in unknown environments. However, moving obstacles with unpredictability in dynamic scenarios, usually increase the difficulty and complexity in collision avoidance of robotic vehicles. To enhance the stability of collision avoidance and boost its adaptability to uncertain dynamic scenes, a new attention-based value classification actor-critic (AVCAC) architecture is proposed. It is an end-to-end robot navigation model that utilizes imperfect local observation to directly plan accurate collision-free motion commands. First, we design a value-classified rollout replaybuffer to categorize the experiences into different pools. It can prevent any overfitting or bias that may result from repeatedly sampling experiences of a certain type during policy learning. Then, we improve the conventional actor-critic network with a multi-head local attention module to extract the local observations at entity-level. This way, the collision avoidance system can focus on key environmental features to operate more efficiently and respond more swiftly to dynamic changes in the environment. Moreover, a lookahead multi-step prediction (LMP) reward setting is devised in the AVCAC-based reinforcement learning (RL) framework to facilitate more informed and forward-looking decision-making. Finally, the policy entropy (PE) and policy delay (PD) are extended to AVCAC model to enhance policy exploration and make policy more robust. Extensive experimental results reveal that our method can generate time-efficient and collision-free guide paths to dodge collisions under complex dynamic environments.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"9 11","pages":"6898-6911"},"PeriodicalIF":14.0000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10504956/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Collision avoidance is a crucial technique to achieve safe and efficient robotic vehicle navigation in unknown environments. However, moving obstacles with unpredictability in dynamic scenarios, usually increase the difficulty and complexity in collision avoidance of robotic vehicles. To enhance the stability of collision avoidance and boost its adaptability to uncertain dynamic scenes, a new attention-based value classification actor-critic (AVCAC) architecture is proposed. It is an end-to-end robot navigation model that utilizes imperfect local observation to directly plan accurate collision-free motion commands. First, we design a value-classified rollout replaybuffer to categorize the experiences into different pools. It can prevent any overfitting or bias that may result from repeatedly sampling experiences of a certain type during policy learning. Then, we improve the conventional actor-critic network with a multi-head local attention module to extract the local observations at entity-level. This way, the collision avoidance system can focus on key environmental features to operate more efficiently and respond more swiftly to dynamic changes in the environment. Moreover, a lookahead multi-step prediction (LMP) reward setting is devised in the AVCAC-based reinforcement learning (RL) framework to facilitate more informed and forward-looking decision-making. Finally, the policy entropy (PE) and policy delay (PD) are extended to AVCAC model to enhance policy exploration and make policy more robust. Extensive experimental results reveal that our method can generate time-efficient and collision-free guide paths to dodge collisions under complex dynamic environments.
期刊介绍:
The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges.
Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.