Reinforcement learning based autonomous multi-rotor landing on moving platforms

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Autonomous Robots Pub Date : 2024-06-06 DOI:10.1007/s10514-024-10162-8

Pascal Goldschmid, Aamir Ahmad

{"title":"Reinforcement learning based autonomous multi-rotor landing on moving platforms","authors":"Pascal Goldschmid, Aamir Ahmad","doi":"10.1007/s10514-024-10162-8","DOIUrl":null,"url":null,"abstract":"<div><p>Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor’s movement in both the longitudinal and the lateral directions. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method <i>Double Q-Learning</i>. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Furthermore, for two comparison scenarios it achieves comparable performance than a cascaded PI controller. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent’s performance. Source code is openly available at https://github.com/robot-perception-group/rl_multi_rotor_landing.</p></div>","PeriodicalId":55409,"journal":{"name":"Autonomous Robots","volume":"48 4-5","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10514-024-10162-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Robots","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10514-024-10162-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor’s movement in both the longitudinal and the lateral directions. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method Double Q-Learning. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Furthermore, for two comparison scenarios it achieves comparable performance than a cascaded PI controller. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent’s performance. Source code is openly available at https://github.com/robot-perception-group/rl_multi_rotor_landing.

Abstract Image

查看原文本刊更多论文

基于强化学习的多旋翼自主着陆移动平台

由于电池容量有限，多旋翼无人飞行器的航程和飞行时间受到限制。在二维移动平台上自主着陆可以补充电池和卸载数据，从而提高飞行器的效用。传统方法依赖于精确、复杂且难以推导的飞行器和环境模型。强化学习（RL）是一种有吸引力的替代方法，因为它能够在训练过程中完全从数据中学习合适的控制策略。然而，目前的方法需要数小时的训练时间，成功率有限，并且依赖于需要通过试错来调整的超参数。我们在这项工作中解决了所有这些问题。首先，我们将着陆程序分解为一系列更简单但类似的学习任务。为此，我们采用了两个基于 RL 的控制器实例，分别用于控制多旋翼飞行器的纵向和横向运动。其次，我们引入了一种功能强大的状态空间离散化技术，该技术基于 i) 运动平台的运动学建模，以获取状态空间拓扑信息；ii) 利用迁移学习将训练结构化为顺序课程。第三，我们还利用移动平台的运动学模型，为训练过程推导出可解释的超参数，确保多旋翼飞行器具有足够的机动性。训练使用表格 RL 方法 Double Q-Learning 进行。通过大量仿真，我们发现与其他深度 RL 方法相比，该方法大大提高了着陆成功率，同时所需的训练时间也更短。此外，在两个对比场景中，它的性能与级联 PI 控制器相当。最后，我们在实际硬件上部署并演示了我们的算法。对于所有评估场景，我们都提供了关于代理性能的统计数据。源代码可通过 https://github.com/robot-perception-group/rl_multi_rotor_landing 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Robots 工程技术-机器人学

CiteScore

7.90

自引率

5.70%

发文量

审稿时长

3 months

期刊介绍： Autonomous Robots reports on the theory and applications of robotic systems capable of some degree of self-sufficiency. It features papers that include performance data on actual robots in the real world. Coverage includes: control of autonomous robots · real-time vision · autonomous wheeled and tracked vehicles · legged vehicles · computational architectures for autonomous systems · distributed architectures for learning, control and adaptation · studies of autonomous robot systems · sensor fusion · theory of autonomous systems · terrain mapping and recognition · self-calibration and self-repair for robots · self-reproducing intelligent structures · genetic algorithms as models for robot development. The focus is on the ability to move and be self-sufficient, not on whether the system is an imitation of biology. Of course, biological models for robotic systems are of major interest to the journal since living systems are prototypes for autonomous behavior.