UAV path planning based on the improved PPO algorithm

2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE) Pub Date : 2022-08-01 DOI:10.1109/ARACE56528.2022.00040

Chenyang Qi, Chengfu Wu, Lei Lei, Xiaolu Li, Peiyan Cong

{"title":"UAV path planning based on the improved PPO algorithm","authors":"Chenyang Qi, Chengfu Wu, Lei Lei, Xiaolu Li, Peiyan Cong","doi":"10.1109/ARACE56528.2022.00040","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the problem of unmanned aerial vehicle (UAV) path planning. The traditional path planning algorithm has the problems of low efficiency and poor adaptability, so this paper uses the reinforcement learning algorithm to complete the path planning. The classic proximal policy optimization (PPO) algorithm has problems that the samples with large rewards in the experience replay buffer will seriously affect training, this situation causes the agent’s exploration performance degradation and the algorithm has poor convergence in some path planning tasks. To solve these problems, this paper proposes a frequency decomposition-PPO algorithm (FD-PPO) based on the frequency decomposition and designs a heuristic reward function to solve the UAV path planning problem. The FD-PPO algorithm decomposes rewards into multi-dimensional frequency rewards, then calculate the frequency return to efficiently guide UAV to complete the path planning task. The simulation results show that the FD-PPO algorithm proposed in this paper can adapt to the complex environment, and has outstanding stability under the continuous state space and continuous action space. At the same time, the FD-PPO algorithm has better performance in path planning than the PPO algorithm.","PeriodicalId":437892,"journal":{"name":"2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARACE56528.2022.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In this paper, we consider the problem of unmanned aerial vehicle (UAV) path planning. The traditional path planning algorithm has the problems of low efficiency and poor adaptability, so this paper uses the reinforcement learning algorithm to complete the path planning. The classic proximal policy optimization (PPO) algorithm has problems that the samples with large rewards in the experience replay buffer will seriously affect training, this situation causes the agent’s exploration performance degradation and the algorithm has poor convergence in some path planning tasks. To solve these problems, this paper proposes a frequency decomposition-PPO algorithm (FD-PPO) based on the frequency decomposition and designs a heuristic reward function to solve the UAV path planning problem. The FD-PPO algorithm decomposes rewards into multi-dimensional frequency rewards, then calculate the frequency return to efficiently guide UAV to complete the path planning task. The simulation results show that the FD-PPO algorithm proposed in this paper can adapt to the complex environment, and has outstanding stability under the continuous state space and continuous action space. At the same time, the FD-PPO algorithm has better performance in path planning than the PPO algorithm.

查看原文本刊更多论文

基于改进PPO算法的无人机路径规划

本文研究了无人机(UAV)的路径规划问题。传统的路径规划算法存在效率低、适应性差的问题，因此本文采用强化学习算法来完成路径规划。经典的近端策略优化(PPO)算法存在经验回放缓冲区中奖励较大的样本会严重影响训练的问题，这种情况会导致智能体的探索性能下降，并且算法在一些路径规划任务中收敛性较差。针对这些问题，本文提出了一种基于频率分解的频率分解- ppo算法(FD-PPO)，并设计了启发式奖励函数来解决无人机路径规划问题。FD-PPO算法将奖励分解为多维频率奖励，然后计算频率回报，有效引导无人机完成路径规划任务。仿真结果表明，本文提出的FD-PPO算法能够适应复杂环境，并在连续状态空间和连续动作空间下具有出色的稳定性。同时，FD-PPO算法在路径规划方面比PPO算法具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE)

自引率

0.00%

发文量