NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance With Nonexpert Policy Guided Reinforcement Learning

IEEE transactions on artificial intelligence Pub Date : 2024-09-20 DOI:10.1109/TAI.2024.3464510

Yuhang Zhang;Chao Yan;Jiaping Xiao;Mir Feroskhan

{"title":"NPE-DRL: Enhancing Perception Constrained Obstacle Avoidance With Nonexpert Policy Guided Reinforcement Learning","authors":"Yuhang Zhang;Chao Yan;Jiaping Xiao;Mir Feroskhan","doi":"10.1109/TAI.2024.3464510","DOIUrl":null,"url":null,"abstract":"Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in 3-D space. Compared with traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity, and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on nonexpert policy enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a nonexpert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the nonexpert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration–exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs’ obstacle avoidance capability. Code available at <uri>https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"184-198"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10684842/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Obstacle avoidance under constrained visual perception presents a significant challenge, requiring rapid detection and decision-making within partially observable environments, particularly for unmanned aerial vehicles (UAVs) maneuvering agilely in 3-D space. Compared with traditional methods, obstacle avoidance algorithms based on deep reinforcement learning (DRL) offer a better comprehension of the uncertain operational environment in an end-to-end manner, reducing computational complexity, and enhancing flexibility and scalability. However, the inherent trial-and-error learning mechanism of DRL necessitates numerous iterations for policy convergence, leading to sample inefficiency issues. Meanwhile, existing sample-efficient obstacle avoidance approaches that leverage imitation learning often heavily rely on offline expert demonstrations, which are not always feasible in hazardous environments. To address these challenges, we propose a novel obstacle avoidance approach based on nonexpert policy enhanced DRL (NPE-DRL). This approach integrates a fundamental DRL framework with prior knowledge derived from a nonexpert policy-guided imitation learning. During the training phase, the agent starts by online imitating the actions generated by the nonexpert policy during interactions and progressively shifts toward autonomously exploring the environment to generate the optimal policy. Both simulation and physical experiments validate that our approach improves sample efficiency and achieves a better exploration–exploitation balance in both virtual and real-world flights. Additionally, our NPE-DRL-based obstacle avoidance approach shows better adaptability in complex environments characterized by larger scales and denser obstacle configurations, demonstrating a significant improvement in UAVs’ obstacle avoidance capability. Code available at https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo.

查看原文本刊更多论文

NPE-DRL：用非专家策略引导的强化学习增强感知约束的避障

在受限视觉感知条件下避障是一项重大挑战，需要在部分可观测环境中快速检测和决策，特别是对于在三维空间中灵活机动的无人机。与传统避障方法相比，基于深度强化学习（DRL）的避障算法能够端到端更好地理解不确定的运行环境，降低了计算复杂度，增强了灵活性和可扩展性。然而，DRL固有的试错学习机制需要大量的迭代来进行策略收敛，从而导致样本效率低下的问题。同时，现有的利用模仿学习的样本高效避障方法通常严重依赖于离线专家演示，这在危险环境中并不总是可行的。为了解决这些问题，我们提出了一种基于非专家策略增强DRL （NPE-DRL）的避障方法。该方法将基本DRL框架与源自非专家策略引导模仿学习的先验知识集成在一起。在训练阶段，智能体从在线模仿非专家策略在交互过程中产生的动作开始，逐步转向自主探索环境以产生最优策略。仿真和物理实验都验证了我们的方法提高了样本效率，并在虚拟和现实世界的飞行中实现了更好的勘探-开采平衡。此外，基于npe - drl的避障方法在更大尺度和更密集障碍物配置的复杂环境中表现出更好的适应性，显著提高了无人机的避障能力。代码可从https://github.com/zzzzzyh111/NonExpert-Guided-Visual-UAV-Navigation-Gazebo获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量