基于自适应增强随机搜索的飞船追逃博弈策略研究

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University Pub Date : 2024-02-01 DOI:10.1051/jnwpu/20244210117

Jie Jiao, Yongjie Gou, Wenbo Wu, Binfeng Pan

{"title":"基于自适应增强随机搜索的飞船追逃博弈策略研究","authors":"Jie Jiao, Yongjie Gou, Wenbo Wu, Binfeng Pan","doi":"10.1051/jnwpu/20244210117","DOIUrl":null,"url":null,"abstract":"To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.","PeriodicalId":515230,"journal":{"name":"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University","volume":"279 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on game strategy of spacecraft chase and escape based on adaptive augmented random search\",\"authors\":\"Jie Jiao, Yongjie Gou, Wenbo Wu, Binfeng Pan\",\"doi\":\"10.1051/jnwpu/20244210117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.\",\"PeriodicalId\":515230,\"journal\":{\"name\":\"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University\",\"volume\":\"279 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1051/jnwpu/20244210117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/jnwpu/20244210117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为解决航天器与非合作目标追逐博弈的生存差分策略截获问题，基于强化学习研究了追逐博弈策略，提出了自适应增强随机搜索算法。首先，为了解决顺序决策的稀疏奖励问题，设计了一种基于策略参数空间扰动的探索方法，从而加快了其收敛速度。其次，为避免过早陷入局部最优，设计了新颖度函数来指导策略更新，提高了数据利用效率。最后，通过数值模拟验证了探索方法的有效性和先进性，并与增强随机搜索算法、近似策略优化算法和深度确定性策略梯度算法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on game strategy of spacecraft chase and escape based on adaptive augmented random search

To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

自引率

0.00%

发文量