基于深度强化学习的翼在地飞行器全局和局部轨迹规划多目标奖励塑造

The Aeronautical Journal (1968) Pub Date : 2023-06-14 DOI:10.1017/aer.2023.43

H. Hu, D. Li, G. Zhang, Z. Zhang

{"title":"基于深度强化学习的翼在地飞行器全局和局部轨迹规划多目标奖励塑造","authors":"H. Hu, D. Li, G. Zhang, Z. Zhang","doi":"10.1017/aer.2023.43","DOIUrl":null,"url":null,"abstract":"\n The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.","PeriodicalId":22567,"journal":{"name":"The Aeronautical Journal (1968)","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning\",\"authors\":\"H. Hu, D. Li, G. Zhang, Z. Zhang\",\"doi\":\"10.1017/aer.2023.43\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.\",\"PeriodicalId\":22567,\"journal\":{\"name\":\"The Aeronautical Journal (1968)\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Aeronautical Journal (1968)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/aer.2023.43\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Aeronautical Journal (1968)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aer.2023.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对地翼飞行器(WIG)的控制通常考虑到许多需求，如巡航、速度、生存和隐身。对这些需求的不同程度的强调导致了不同的轨迹，但是还没有一种整合和量化它们的方法。此外，以往对其他车辆多目标轨迹的研究大多是全局规划，缺乏局部规划。针对无人机的多目标轨迹规划问题，本文提出了一种多项式形式的多目标函数，其中每个项目代表一个独立的需求，并通过线性或指数权重进行调整。它使用权重的大小来显示相对于相应需求的关注程度。利用基于引入的多目标函数和深度强化学习(DRL)的奖励整形，规划了波浪槽地形上方虚拟WIG模型的轨迹。考虑了全局和局部两种情况:对整个环境分配单一的权重方案，对环境的两个部分分配两种不同的权重方案。从局部和全局两个角度分析了多目标奖励函数的有效性。奖励函数为WIGs提供了一个调整权重大小的通用框架，以满足不同程度的巡航、速度、隐身和生存要求，并帮助WIGs在工程上引导预期的轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-objective reward shaping for global and local trajectory planning of wing-in-ground crafts based on deep reinforcement learning

The control of a wing-in-ground craft (WIG) usually allows for many needs, like cruising, speed, survival and stealth. Various degrees of emphasis on these requirements result in different trajectories, but there has not been a way of integrating and quantifying them yet. Moreover, most previous studies on other vehicles’ multi-objective trajectory is planned globally, lacking for local planning. For the multi-objective trajectory planning of WIGs, this paper proposes a multi-objective function in a polynomial form, in which each item represents an independent requirement and is adjusted by a linear or exponential weight. It uses the magnitude of weights to demonstrate how much attention is paid relatively to the corresponding demand. Trajectories of a virtual WIG model above the wave trough terrain are planned using reward shaping based on the introduced multi-objective function and deep reinforcement learning (DRL). Two conditions are considered globally and locally: a single scheme of weights is assigned to the whole environment, and two different schemes of weights are assigned to the two parts of the environment. Effectiveness of the multi-object reward function is analysed from the local and global perspectives. The reward function provides WIGs with a universal framework for adjusting the magnitude of weights, to meet different degrees of requirements on cruising, speed, stealth and survival, and helps WIGs guide an expected trajectory in engineering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Aeronautical Journal (1968)

自引率

0.00%

发文量