{"title":"多约束和多胜利条件下脉冲轨道攻防博弈的可变奖励函数驱动策略","authors":"Liran Zhao, Sihan Xu, Qinbo Sun, Zhaohui Dang","doi":"10.1016/j.dt.2025.05.002","DOIUrl":null,"url":null,"abstract":"<div><div>This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.</div></div>","PeriodicalId":58209,"journal":{"name":"Defence Technology(防务技术)","volume":"51 ","pages":"Pages 159-183"},"PeriodicalIF":5.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions\",\"authors\":\"Liran Zhao, Sihan Xu, Qinbo Sun, Zhaohui Dang\",\"doi\":\"10.1016/j.dt.2025.05.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.</div></div>\",\"PeriodicalId\":58209,\"journal\":{\"name\":\"Defence Technology(防务技术)\",\"volume\":\"51 \",\"pages\":\"Pages 159-183\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Defence Technology(防务技术)\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2214914725001497\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Defence Technology(防务技术)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214914725001497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions
This paper investigates impulsive orbital attack-defense (AD) games under multiple constraints and victory conditions, involving three spacecraft: attacker, target, and defender. In the AD scenario, the attacker aims to breach the defender's interception to rendezvous with the target, while the defender seeks to protect the target by blocking or actively pursuing the attacker. Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity, thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches. To address these challenges, this study proposes a multi-agent deep reinforcement learning solution with variable reward functions. Two attack strategies, Direct attack (DA) and Bypass attack (BA), are developed for the attacker, each focusing on different mission priorities. Similarly, two defense strategies, Direct interdiction (DI) and Collinear interdiction (CI), are designed for the defender, each optimizing specific defensive actions through tailored reward functions. Each reward function incorporates both process rewards (e.g., distance and angle) and outcome rewards, derived from physical principles and validated via geometric analysis. Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75% for DI vs. DA, 40% for DI vs. BA, 80% for CI vs. DA, and 70% for CI vs. BA. Results indicate that CI outperforms DI for defenders, while BA outperforms DA for attackers. Moreover, defenders achieve their objectives more effectively under identical maneuvering capabilities. Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies. These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
Defence Technology(防务技术)Mechanical Engineering, Control and Systems Engineering, Industrial and Manufacturing Engineering
CiteScore
8.70
自引率
0.00%
发文量
728
审稿时长
25 days
期刊介绍:
Defence Technology, a peer reviewed journal, is published monthly and aims to become the best international academic exchange platform for the research related to defence technology. It publishes original research papers having direct bearing on defence, with a balanced coverage on analytical, experimental, numerical simulation and applied investigations. It covers various disciplines of science, technology and engineering.