基于近端策略优化的强化学习三维滑模拦截制导

Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She
{"title":"基于近端策略优化的强化学习三维滑模拦截制导","authors":"Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She","doi":"10.1109/JMASS.2023.3325054","DOIUrl":null,"url":null,"abstract":"This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.","PeriodicalId":100624,"journal":{"name":"IEEE Journal on Miniaturization for Air and Space Systems","volume":"4 4","pages":"423-430"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning-Based 3-D Sliding Mode Interception Guidance via Proximal Policy Optimization\",\"authors\":\"Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She\",\"doi\":\"10.1109/JMASS.2023.3325054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.\",\"PeriodicalId\":100624,\"journal\":{\"name\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"volume\":\"4 4\",\"pages\":\"423-430\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10287104/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Miniaturization for Air and Space Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10287104/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种新的机动目标三维滑模拦截制导律,探索了强化学习技术在提高制导精度和减少抖振方面的潜力。将拦截机动目标的制导问题抽象为一个马尔可夫决策过程,并建立奖励函数来估计偏离目标量和视距角速率抖振。重要的是,可以提出一种适用于基于强化学习的一般制导问题的奖励函数设计框架。然后,引入训练性能满意的最近邻策略优化算法,学习一种表示观察到的交战状态的动作策略,用于滑模拦截制导。最后,通过数值仿真和比较验证了所提制导律的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning-Based 3-D Sliding Mode Interception Guidance via Proximal Policy Optimization
This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信