Intelligent Decision-Making System of Air Defense Resource Allocation via Hierarchical Reinforcement Learning

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Intelligent Systems Pub Date : 2024-06-05 DOI:10.1155/2024/7777050

Minrui Zhao, Gang Wang, Qiang Fu, Wen Quan, Quan Wen, Xiaoqiang Wang, Tengda Li, Yu Chen, Shan Xue, Jiaozhi Han

{"title":"Intelligent Decision-Making System of Air Defense Resource Allocation via Hierarchical Reinforcement Learning","authors":"Minrui Zhao, Gang Wang, Qiang Fu, Wen Quan, Quan Wen, Xiaoqiang Wang, Tengda Li, Yu Chen, Shan Xue, Jiaozhi Han","doi":"10.1155/2024/7777050","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Intelligent decision-making in air defense operations has attracted wide attention from researchers. Facing complex battlefield environments, existing decision-making algorithms fail to make targeted decisions according to the hierarchical decision-making characteristics of air defense operational command and control. What’s worse, in the process of problem-solving, these algorithms are beset by defects such as dimensional disaster and poor real-time performance. To address these problems, a new hierarchical reinforcement learning algorithm named Hierarchy Asynchronous Advantage Actor-Critic (H-A3C) is developed. This algorithm is designed to have a hierarchical decision-making framework considering the characteristics of air defense operations and employs the hierarchical reinforcement learning method for problem-solving. With a hierarchical decision-making capability similar to that of human commanders in decision-making, the developed algorithm produces many new policies during the learning process. The features of air situation information are extracted using the bidirectional-gated recurrent unit (Bi-GRU) network, and then the agent is trained using the H-A3C algorithm. In the training process, the multihead attention mechanism and the event-based reward mechanism are introduced to facilitate the training. In the end, the proposed H-A3C algorithm is verified in a digital battlefield environment, and the results prove its advantages over existing algorithms.</p>\n </div>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2024 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/2024/7777050","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/2024/7777050","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Intelligent decision-making in air defense operations has attracted wide attention from researchers. Facing complex battlefield environments, existing decision-making algorithms fail to make targeted decisions according to the hierarchical decision-making characteristics of air defense operational command and control. What’s worse, in the process of problem-solving, these algorithms are beset by defects such as dimensional disaster and poor real-time performance. To address these problems, a new hierarchical reinforcement learning algorithm named Hierarchy Asynchronous Advantage Actor-Critic (H-A3C) is developed. This algorithm is designed to have a hierarchical decision-making framework considering the characteristics of air defense operations and employs the hierarchical reinforcement learning method for problem-solving. With a hierarchical decision-making capability similar to that of human commanders in decision-making, the developed algorithm produces many new policies during the learning process. The features of air situation information are extracted using the bidirectional-gated recurrent unit (Bi-GRU) network, and then the agent is trained using the H-A3C algorithm. In the training process, the multihead attention mechanism and the event-based reward mechanism are introduced to facilitate the training. In the end, the proposed H-A3C algorithm is verified in a digital battlefield environment, and the results prove its advantages over existing algorithms.

Abstract Image

查看原文本刊更多论文

通过层次强化学习实现防空资源分配的智能决策系统

防空作战中的智能决策已引起研究人员的广泛关注。面对复杂的战场环境，现有的决策算法无法根据防空作战指挥控制的层次化决策特点做出有针对性的决策。更严重的是，在解决问题的过程中，这些算法存在维度灾难、实时性差等缺陷。为了解决这些问题，一种名为 "层次异步优势行动者-批评者（Hierarchy Asynchronous Advantage Actor-Critic，H-A3C）"的新型层次强化学习算法应运而生。考虑到防空作战的特点，该算法设计了一个分层决策框架，并采用分层强化学习方法来解决问题。所开发的算法具有类似于人类指挥官决策的分层决策能力，在学习过程中会产生许多新策略。使用双向门控递归单元（Bi-GRU）网络提取空情信息特征，然后使用 H-A3C 算法训练代理。在训练过程中，引入了多头注意机制和基于事件的奖励机制，以方便训练。最后，在数字战场环境中对所提出的 H-A3C 算法进行了验证，结果证明了其相对于现有算法的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Intelligent Systems 工程技术-计算机：人工智能

CiteScore

11.30

自引率

14.30%

发文量

304

审稿时长

9 months

期刊介绍： The International Journal of Intelligent Systems serves as a forum for individuals interested in tapping into the vast theories based on intelligent systems construction. With its peer-reviewed format, the journal explores several fascinating editorials written by today''s experts in the field. Because new developments are being introduced each day, there''s much to be learned — examination, analysis creation, information retrieval, man–computer interactions, and more. The International Journal of Intelligent Systems uses charts and illustrations to demonstrate these ground-breaking issues, and encourages readers to share their thoughts and experiences.