Multi-Ship Dynamic Weapon-Target Assignment via Cooperative Distributional Reinforcement Learning With Dynamic Reward

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-09-17 DOI:10.1109/TETCI.2024.3451338

Zhe Peng;Zhifeng Lu;Xiao Mao;Feng Ye;Kuihua Huang;Guohua Wu;Ling Wang

{"title":"Multi-Ship Dynamic Weapon-Target Assignment via Cooperative Distributional Reinforcement Learning With Dynamic Reward","authors":"Zhe Peng;Zhifeng Lu;Xiao Mao;Feng Ye;Kuihua Huang;Guohua Wu;Ling Wang","doi":"10.1109/TETCI.2024.3451338","DOIUrl":null,"url":null,"abstract":"In fleet air defense, the efficient coordination of multiple ships to complete weapon-target assignment has always been a critical challenge, primarily due to the varying combat capabilities and duties associated with each ship. Consequently, the traditional “weapon-target” assignment mode has turned into a “ship-weapon-target” assignment mode in the multi-ship dynamic weapon-target assignment (MS-DWTA) problem we proposed, with a larger solution space. In this problem, different ships possess distinct attributes, such as defense duties, weapon types, and loaded missile quantities. To solve this problem, we proposed an Attention enhanced multi-agent Distributional reinforcement learning method with Dynamic Reward (ADDR). Different from standard reinforcement learning method, ADDR learns to estimate the distribution, as opposed to only the expectation of future return, enabling better adaptation to air defense scenarios with significant randomness. The multi-head attention network integrates both the ship situation and the target situation to appropriately adjust the output of each agent, which explicitly considers the agent-level impact of ships to the whole fleet. Moreover, due to the missile fight time, ships may not immediately receive rewards after executing actions. To address this delayed phenomenon, we designed a dynamic reward mechanism to accurately adjust the delayed rewards. Through extensive simulation experiments, ADDR has demonstrated superior performance over multiple evaluation metrics.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1843-1859"},"PeriodicalIF":5.3000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10681512/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In fleet air defense, the efficient coordination of multiple ships to complete weapon-target assignment has always been a critical challenge, primarily due to the varying combat capabilities and duties associated with each ship. Consequently, the traditional “weapon-target” assignment mode has turned into a “ship-weapon-target” assignment mode in the multi-ship dynamic weapon-target assignment (MS-DWTA) problem we proposed, with a larger solution space. In this problem, different ships possess distinct attributes, such as defense duties, weapon types, and loaded missile quantities. To solve this problem, we proposed an Attention enhanced multi-agent Distributional reinforcement learning method with Dynamic Reward (ADDR). Different from standard reinforcement learning method, ADDR learns to estimate the distribution, as opposed to only the expectation of future return, enabling better adaptation to air defense scenarios with significant randomness. The multi-head attention network integrates both the ship situation and the target situation to appropriately adjust the output of each agent, which explicitly considers the agent-level impact of ships to the whole fleet. Moreover, due to the missile fight time, ships may not immediately receive rewards after executing actions. To address this delayed phenomenon, we designed a dynamic reward mechanism to accurately adjust the delayed rewards. Through extensive simulation experiments, ADDR has demonstrated superior performance over multiple evaluation metrics.

查看原文本刊更多论文

基于动态奖励协同分布强化学习的多舰动态武器目标分配

在舰队防空中，有效协调多艘舰艇完成武器目标分配一直是一个关键挑战，主要是由于每艘舰艇的作战能力和任务不同。因此，在本文提出的多舰动态武器目标分配（MS-DWTA）问题中，传统的“武器-目标”分配模式转变为“舰-武器-目标”分配模式，求解空间更大。在这个问题中，不同的舰船具有不同的属性，如防御任务、武器类型和装载的导弹数量。为了解决这一问题，我们提出了一种基于动态奖励（ADDR）的注意力增强多智能体分布式强化学习方法。与标准的强化学习方法不同，ADDR学习估计分布，而不仅仅是对未来收益的期望，能够更好地适应随机性较大的防空场景。多头关注网络将船舶情况和目标情况相结合，适当调整各agent的输出，明确考虑了船舶对整个船队的agent级影响。此外，由于导弹的战斗时间，舰艇可能不会在执行行动后立即获得奖励。为了解决这种延迟现象，我们设计了一个动态奖励机制来精确调整延迟奖励。通过大量的仿真实验，ADDR在多个评估指标上表现出了优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.