利用在线非政策积分强化学习的计算智能拦截指导法

IF 1.9 3区 计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS
Qi Wang, Zhizhong Liao
{"title":"利用在线非政策积分强化学习的计算智能拦截指导法","authors":"Qi Wang, Zhizhong Liao","doi":"10.23919/jsee.2024.000067","DOIUrl":null,"url":null,"abstract":"Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"21 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computational Intelligence Interception Guidance Law Using Online Off-Policy Integral Reinforcement Learning\",\"authors\":\"Qi Wang, Zhizhong Liao\",\"doi\":\"10.23919/jsee.2024.000067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.\",\"PeriodicalId\":50030,\"journal\":{\"name\":\"Journal of Systems Engineering and Electronics\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Engineering and Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.23919/jsee.2024.000067\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.23919/jsee.2024.000067","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

导弹拦截问题可以看作是一个两人零和微分博弈问题,它取决于汉密尔顿-雅各比-艾萨克(Hamilton-Jacobi-Isaacs,HJI)方程的解。由于 HJI 方程的非线性,已被证明不可能得到闭式解,因此提出了许多迭代算法来求解 HJI 方程。同步策略更新算法(SPUA)是求解 HJI 方程的一种有效算法,但它是一种策略上的积分强化学习(IRL)。要在线实现 SPUA,干扰信号需要可调,这是不现实的。本文提出了一种基于 SPUA 的非策略 IRL 算法,该算法无需使用任何系统动态知识。然后,提出了一种基于神经网络的非策略 IRL 算法的在线自适应批判实现方案。基于在线非策略 IRL 方法,开发了用于拦截高机动目标的计算智能拦截制导(CIIG)法则。作为一种无模型方法,拦截目标可以通过在线测量系统数据来实现。通过两个导弹和目标交战场景验证了 CIIG 的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Computational Intelligence Interception Guidance Law Using Online Off-Policy Integral Reinforcement Learning
Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Systems Engineering and Electronics
Journal of Systems Engineering and Electronics 工程技术-工程:电子与电气
CiteScore
4.10
自引率
14.30%
发文量
131
审稿时长
7.5 months
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信