{"title":"基于自适应增强随机搜索的飞船追逃博弈策略研究","authors":"Jie Jiao, Yongjie Gou, Wenbo Wu, Binfeng Pan","doi":"10.1051/jnwpu/20244210117","DOIUrl":null,"url":null,"abstract":"To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.","PeriodicalId":515230,"journal":{"name":"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University","volume":"279 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on game strategy of spacecraft chase and escape based on adaptive augmented random search\",\"authors\":\"Jie Jiao, Yongjie Gou, Wenbo Wu, Binfeng Pan\",\"doi\":\"10.1051/jnwpu/20244210117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.\",\"PeriodicalId\":515230,\"journal\":{\"name\":\"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University\",\"volume\":\"279 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1051/jnwpu/20244210117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/jnwpu/20244210117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on game strategy of spacecraft chase and escape based on adaptive augmented random search
To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.