An Off-COMA Algorithm for Multi-UCAV Intelligent Combat Decision-Making

2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS) Pub Date : 2022-10-28 DOI:10.1109/DOCS55193.2022.9967776

Zhengkang Shi, Jingcheng Wang, Hongyuan Wang

{"title":"An Off-COMA Algorithm for Multi-UCAV Intelligent Combat Decision-Making","authors":"Zhengkang Shi, Jingcheng Wang, Hongyuan Wang","doi":"10.1109/DOCS55193.2022.9967776","DOIUrl":null,"url":null,"abstract":"Unmanned Combat Aerial Vehicle (UCAV) has played an important role in modern military warfare, whose level of intelligent decision-making needs to be improved urgently. In this paper, a simplified multi-UCAV combat environment is established, which is modeled as a multi-agent Markov games. There are many difficulties in multi-UCAV combat problem, including strong randomness and complexity, sparse rewards, and no strong opponents for training. In order to solve the above problems, an algorithm called Off Conterfactual Multi-Agent (Off-COMA) is proposed. This algorithm extends the COMA algorithm to the off-policy version, and can reuse historical data for training, which improves data utilization. In addition, the proposed Off-COMA algorithm exploits an improved prioritized experience replay method to deal with the sparse reward. This paper presents an asymmetric policy replay self-play method, which provides a guarantee for the algorithm to generate a powerful policy. Finally, compared with several classical multi-agent reinforcement learning algorithms, the superiority of Off-COMA algorithm in solving the multi-UCAV combat problem is verified.","PeriodicalId":348545,"journal":{"name":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DOCS55193.2022.9967776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Unmanned Combat Aerial Vehicle (UCAV) has played an important role in modern military warfare, whose level of intelligent decision-making needs to be improved urgently. In this paper, a simplified multi-UCAV combat environment is established, which is modeled as a multi-agent Markov games. There are many difficulties in multi-UCAV combat problem, including strong randomness and complexity, sparse rewards, and no strong opponents for training. In order to solve the above problems, an algorithm called Off Conterfactual Multi-Agent (Off-COMA) is proposed. This algorithm extends the COMA algorithm to the off-policy version, and can reuse historical data for training, which improves data utilization. In addition, the proposed Off-COMA algorithm exploits an improved prioritized experience replay method to deal with the sparse reward. This paper presents an asymmetric policy replay self-play method, which provides a guarantee for the algorithm to generate a powerful policy. Finally, compared with several classical multi-agent reinforcement learning algorithms, the superiority of Off-COMA algorithm in solving the multi-UCAV combat problem is verified.

查看原文本刊更多论文

多无人机智能作战决策的非昏迷算法

无人作战飞机(UCAV)在现代军事战争中发挥着重要作用，其智能决策水平亟待提高。本文建立了一种简化的多无人机作战环境，并将其建模为多智能体马尔可夫博弈。多无人机作战问题存在随机性和复杂性强、奖励稀疏、训练无强敌等诸多难点。为了解决上述问题，提出了一种Off- contfactual Multi-Agent (Off- coma)算法。该算法将COMA算法扩展到off-policy版本，可以重用历史数据进行训练，提高了数据利用率。此外，本文提出的Off-COMA算法利用改进的优先体验重放方法来处理稀疏奖励。本文提出了一种非对称策略重播自播放方法，为算法生成功能强大的策略提供了保证。最后，通过与几种经典多智能体强化学习算法的比较，验证了Off-COMA算法在解决多无人机作战问题上的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)

自引率

0.00%

发文量