多约束和多胜利条件下脉冲轨道攻防博弈的可变奖励函数驱动策略

IF 5.9 Q1 ENGINEERING, MULTIDISCIPLINARY

Defence Technology(防务技术) Pub Date : 2025-09-01 DOI:10.1016/j.dt.2025.05.002

Liran Zhao, Sihan Xu, Qinbo Sun, Zhaohui Dang

{"title":"多约束和多胜利条件下脉冲轨道攻防博弈的可变奖励函数驱动策略","authors":"Liran Zhao, Sihan Xu, Qinbo Sun, Zhaohui Dang","doi":"10.1016/j.dt.2025.05.002","DOIUrl":null,"url":null,"abstract":"<div><div>This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.</div></div>","PeriodicalId":58209,"journal":{"name":"Defence Technology(防务技术)","volume":"51 ","pages":"Pages 159-183"},"PeriodicalIF":5.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions\",\"authors\":\"Liran Zhao, Sihan Xu, Qinbo Sun, Zhaohui Dang\",\"doi\":\"10.1016/j.dt.2025.05.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.</div></div>\",\"PeriodicalId\":58209,\"journal\":{\"name\":\"Defence Technology(防务技术)\",\"volume\":\"51 \",\"pages\":\"Pages 159-183\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Defence Technology(防务技术)\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214914725001497\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defence Technology(防务技术)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214914725001497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

研究了多约束和多胜利条件下的脉冲轨道攻防博弈问题，涉及攻击者、目标和防御者三个航天器。在AD场景中，攻击者的目的是突破防御者的拦截与目标会合，防御者的目的是通过阻挡或主动追击攻击者来保护目标。将四种不同的机动约束和五种潜在的博弈结果结合起来，更准确地模拟AD博弈问题并增加复杂性，从而降低了差分博弈和博弈树搜索等传统方法的有效性。为了解决这些挑战，本研究提出了一种具有可变奖励函数的多智能体深度强化学习解决方案。针对攻击者提出了直接攻击（DA）和旁路攻击（BA）两种攻击策略，分别针对不同的任务优先级。同样，针对防御方设计了直接阻断（DI）和共线阻断（CI）两种防御策略，每种策略都通过量身定制的奖励函数来优化特定的防御行为。每个奖励函数都包含过程奖励（如距离和角度）和结果奖励，这些奖励源自物理原理并通过几何分析得到验证。四种策略对抗的广泛模拟表明，DI对DA的平均防御成功率为75%，DI对BA的平均防御成功率为40%，CI对DA的平均防御成功率为80%，CI对BA的平均防御成功率为70%。结果表明，对于防御者来说，CI优于DI，而对于攻击者来说，BA优于DA。此外，在相同的机动能力下，防御者更有效地实现其目标。轨迹演化分析进一步说明了所提出的可变奖励函数驱动策略的有效性。这些策略和分析为实际的轨道防御场景提供了有价值的指导，为未来的多智能体博弈研究奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions

This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Defence Technology(防务技术) Mechanical Engineering, Control and Systems Engineering, Industrial and Manufacturing Engineering

CiteScore

8.70

自引率

0.00%

发文量

728

审稿时长

25 days

期刊介绍： Defence Technology, a peer reviewed journal, is published monthly and aims to become the best international academic exchange platform for the research related to defence technology. It publishes original research papers having direct bearing on defence, with a balanced coverage on analytical, experimental, numerical simulation and applied investigations. It covers various disciplines of science, technology and engineering.