A Deep Reinforcement Learning-Based Intelligent Maneuvering Strategy for the High-Speed UAV Pursuit-Evasion Game

Drones Pub Date : 2024-07-09 DOI:10.3390/drones8070309
Tian Yan, Can Liu, Mengjing Gao, Zijian Jiang, Tong Li
{"title":"A Deep Reinforcement Learning-Based Intelligent Maneuvering Strategy for the High-Speed UAV Pursuit-Evasion Game","authors":"Tian Yan, Can Liu, Mengjing Gao, Zijian Jiang, Tong Li","doi":"10.3390/drones8070309","DOIUrl":null,"url":null,"abstract":"Given the rapid advancements in kinetic pursuit technology, this paper introduces an innovative maneuvering strategy, denoted as LSRC-TD3, which integrates line-of-sight (LOS) angle rate correction with deep reinforcement learning (DRL) for high-speed unmanned aerial vehicle (UAV) pursuit–evasion (PE) game scenarios, with the aim of effectively evading high-speed and high-dynamic pursuers. In the challenging situations of the game, where both speed and maximum available overload are at a disadvantage, the playing field of UAVs is severely compressed, and the difficulty of evasion is significantly increased, placing higher demands on the strategy and timing of maneuvering to change orbit. While considering evasion, trajectory constraint, and energy consumption, we formulated the reward function by combining “terminal” and “process” rewards, as well as “strong” and “weak” incentive guidance to reduce pre-exploration difficulty and accelerate convergence of the game network. Additionally, this paper presents a correction factor for LOS angle rate into the double-delay deterministic gradient strategy (TD3), thereby enhancing the sensitivity of high-speed UAVs to changes in LOS rate, as well as the accuracy of evasion timing, which improves the effectiveness and adaptive capability of the intelligent maneuvering strategy. The Monte Carlo simulation results demonstrate that the proposed method achieves a high level of evasion performance—integrating energy optimization with the requisite miss distance for high-speed UAVs—and accomplishes efficient evasion under highly challenging PE game scenarios.","PeriodicalId":507567,"journal":{"name":"Drones","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Drones","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/drones8070309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Given the rapid advancements in kinetic pursuit technology, this paper introduces an innovative maneuvering strategy, denoted as LSRC-TD3, which integrates line-of-sight (LOS) angle rate correction with deep reinforcement learning (DRL) for high-speed unmanned aerial vehicle (UAV) pursuit–evasion (PE) game scenarios, with the aim of effectively evading high-speed and high-dynamic pursuers. In the challenging situations of the game, where both speed and maximum available overload are at a disadvantage, the playing field of UAVs is severely compressed, and the difficulty of evasion is significantly increased, placing higher demands on the strategy and timing of maneuvering to change orbit. While considering evasion, trajectory constraint, and energy consumption, we formulated the reward function by combining “terminal” and “process” rewards, as well as “strong” and “weak” incentive guidance to reduce pre-exploration difficulty and accelerate convergence of the game network. Additionally, this paper presents a correction factor for LOS angle rate into the double-delay deterministic gradient strategy (TD3), thereby enhancing the sensitivity of high-speed UAVs to changes in LOS rate, as well as the accuracy of evasion timing, which improves the effectiveness and adaptive capability of the intelligent maneuvering strategy. The Monte Carlo simulation results demonstrate that the proposed method achieves a high level of evasion performance—integrating energy optimization with the requisite miss distance for high-speed UAVs—and accomplishes efficient evasion under highly challenging PE game scenarios.
基于深度强化学习的高速无人机追逐-入侵博弈智能操纵策略
鉴于动能追击技术的突飞猛进,本文针对高速无人飞行器(UAV)追击-规避(PE)博弈场景,提出了一种创新的机动策略,将视线角速率校正与深度强化学习(DRL)相结合,旨在有效规避高速、高动态的追击者,并将其命名为LSRC-TD3。在速度和最大可用过载都处于劣势的高难度博弈情况下,无人机的竞技场被严重压缩,规避难度显著增加,对机动变轨的策略和时机提出了更高的要求。在考虑规避、轨迹约束、能耗等因素的同时,我们通过 "终点 "与 "过程 "奖励、"强 "与 "弱 "激励引导相结合的方式制定奖励函数,以降低前期探索难度,加速博弈网络收敛。此外,本文在双延迟确定性梯度策略(TD3)中加入了LOS角速率的修正系数,从而增强了高速无人机对LOS速率变化的敏感性以及规避时机的准确性,提高了智能机动策略的有效性和自适应能力。蒙特卡罗仿真结果表明,所提出的方法实现了高水平的规避性能--将能量优化与高速无人机所需的失误距离相结合,在极具挑战性的PE博弈场景下完成了高效规避。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信