Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning

2019 11th International Conference on Wireless Communications and Signal Processing (WCSP) Pub Date : 2019-10-01 DOI:10.1109/WCSP.2019.8928110

Xiao Han, Jing Wang, Jiayin Xue, Qinyu Zhang

{"title":"Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning","authors":"Xiao Han, Jing Wang, Jiayin Xue, Qinyu Zhang","doi":"10.1109/WCSP.2019.8928110","DOIUrl":null,"url":null,"abstract":"With the growing utilization of UAV in reconnaissance, agriculture, logistics and entertainment, Autonomous collision avoidance during flight has become a necessary capability for modern UAV to detect the surrounding environment and guarantee their own safety. Autonomous obstacle avoidance is a typical agent decision-making problem. Unfortunately, existing traditional decision-making methods perform poorly in this specific realm, In particular, it is unable to meet the requirements of three-dimensional obstacle avoidance of UAV, so we introduce the deep reinforcement learning (DRL) technique into autonomous obstacle avoidance. We model the obstacle avoidance process as a Markov Decision Process and introduce a structure composed of double joint neural network estimators as the decision-maker, whose input is omnidirectional sonar readings and whose output is a value function estimating future rewards. Also, we propose an adaption in the procedure of memory replay to optimize the sampling, where we assign weights to the transitions and sample them accordingly. Our method is applied in a 3-dimensional physic environment, which contains both random dynamic obstacles and floating bouncing obstacles. The goal of the drone is to reach the terminal point without crash. Double Q Learning method with priority sampling, by comparison, achieves the most excellent performance in our simulation. Compared with the traditional algorithms, the proposed algorithm not only ensures the quality of decision making, enabling the agent to learn the optimal strategy, but also effectively improves the performance of the task and the efficiency of decision making. Simulation results demonstrate its effectiveness.","PeriodicalId":108635,"journal":{"name":"2019 11th International Conference on Wireless Communications and Signal Processing (WCSP)","volume":"124 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Wireless Communications and Signal Processing (WCSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCSP.2019.8928110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

With the growing utilization of UAV in reconnaissance, agriculture, logistics and entertainment, Autonomous collision avoidance during flight has become a necessary capability for modern UAV to detect the surrounding environment and guarantee their own safety. Autonomous obstacle avoidance is a typical agent decision-making problem. Unfortunately, existing traditional decision-making methods perform poorly in this specific realm, In particular, it is unable to meet the requirements of three-dimensional obstacle avoidance of UAV, so we introduce the deep reinforcement learning (DRL) technique into autonomous obstacle avoidance. We model the obstacle avoidance process as a Markov Decision Process and introduce a structure composed of double joint neural network estimators as the decision-maker, whose input is omnidirectional sonar readings and whose output is a value function estimating future rewards. Also, we propose an adaption in the procedure of memory replay to optimize the sampling, where we assign weights to the transitions and sample them accordingly. Our method is applied in a 3-dimensional physic environment, which contains both random dynamic obstacles and floating bouncing obstacles. The goal of the drone is to reach the terminal point without crash. Double Q Learning method with priority sampling, by comparison, achieves the most excellent performance in our simulation. Compared with the traditional algorithms, the proposed algorithm not only ensures the quality of decision making, enabling the agent to learn the optimal strategy, but also effectively improves the performance of the task and the efficiency of decision making. Simulation results demonstrate its effectiveness.

查看原文本刊更多论文

基于深度强化学习的无人机三维动态避障智能决策

随着无人机在侦察、农业、物流、娱乐等领域的应用日益广泛，飞行中自主避碰已成为现代无人机探测周围环境、保障自身安全的必要能力。自主避障是典型的智能体决策问题。然而，现有的传统决策方法在这一特定领域表现不佳，特别是无法满足无人机三维避障的要求，因此我们将深度强化学习(DRL)技术引入到自主避障中。我们将避障过程建模为马尔可夫决策过程，并引入双联合神经网络估计器结构作为决策者，其输入是全向声纳数据，输出是估计未来奖励的值函数。此外，我们提出了一种适应记忆回放过程的方法来优化采样，其中我们为过渡分配权重并相应地对它们进行采样。该方法应用于三维物理环境中，该环境中既有随机的动态障碍物，也有浮动的弹跳障碍物。无人机的目标是到达终点而不坠毁。通过比较，采用优先抽样的双Q学习方法在我们的仿真中获得了最优异的性能。与传统算法相比，该算法不仅保证了决策质量，使智能体能够学习到最优策略，而且有效地提高了任务性能和决策效率。仿真结果验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 11th International Conference on Wireless Communications and Signal Processing (WCSP)

自引率

0.00%

发文量