UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience

IF 2.1 3区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Journal of Systems Engineering and Electronics Pub Date : 2024-04-23 DOI:10.23919/jsee.2024.000022

Guang Zhan, Kun Zhang, Ke Li, Haiyin Piao

{"title":"UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience","authors":"Guang Zhan, Kun Zhang, Ke Li, Haiyin Piao","doi":"10.23919/jsee.2024.000022","DOIUrl":null,"url":null,"abstract":"Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"51 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.23919/jsee.2024.000022","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.

查看原文本刊更多论文

基于专家经验指导下深度强化学习的无人机操纵决策算法

在未来的无人战场上，要执行指挥官下达的战术任务，国防部门必须实现无人飞行器（UAV）的自主操控。为提高无人飞行器在交互环境下的自主决策能力，人们进行了大量的研究，其中寻找最优操纵决策策略成为实现无人飞行器智能化的关键问题之一。本文提出了一种在专家经验指导下基于深度强化学习的自主空投机动决策算法。具体来说，我们在传统空对地火力控制方法的基础上，细化了空投过程中的区域引导和特定点引导任务。此外，我们还基于马尔可夫决策过程（MDP）构建了无人机机动决策模型。具体而言，我们提出了一种奖励塑造方法，利用基于潜能的函数和专家指导建议，实现对区域的引导和对特定点任务的引导。所提出的算法可以加快机动决策策略的收敛速度，并在后期训练过程中提高策略输出的稳定性。通过训练参数曲线和测试训练策略的大量实验结果，说明了所提出的机动决策策略的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊