{"title":"基于深度强化学习的翼在地飞行器全局和局部轨迹规划多目标奖励塑造","authors":"H. Hu, D. Li, G. Zhang, Z. Zhang","doi":"10.1017/aer.2023.43","DOIUrl":null,"url":null,"abstract":"\n The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.","PeriodicalId":22567,"journal":{"name":"The Aeronautical Journal (1968)","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning\",\"authors\":\"H. Hu, D. Li, G. Zhang, Z. Zhang\",\"doi\":\"10.1017/aer.2023.43\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.\",\"PeriodicalId\":22567,\"journal\":{\"name\":\"The Aeronautical Journal (1968)\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Aeronautical Journal (1968)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/aer.2023.43\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Aeronautical Journal (1968)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aer.2023.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning
The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.