Prediction Error-Based Action Policy Learning for Quadcopter Flight Control

Engineering Proceedings Pub Date : 2021-12-29 DOI:10.3390/engproc2021012047

Jamal Shams Khanzada, Wasif Muhammad, M. J. Irshad

{"title":"Prediction Error-Based Action Policy Learning for Quadcopter Flight Control","authors":"Jamal Shams Khanzada, Wasif Muhammad, M. J. Irshad","doi":"10.3390/engproc2021012047","DOIUrl":null,"url":null,"abstract":"Quadcopters are finding their place in everything from transportation, delivery, hospitals, and to homes in almost every part of daily life. In places where human intervention for quadcopter flight control is impossible, it becomes necessary to equip drones with intelligent autopilot systems so that they can make decisions on their own. All previous reinforcement learning (RL)-based efforts for quadcopter flight control in complex, dynamic, and unstructured environments remained unsuccessful during the training phase in avoiding the trend of catastrophic failures by naturally unstable quadcopters. In this work, we propose a complementary approach for quadcopter flight control using prediction error as an effective control policy reward in the sensory space instead of rewards from unstable action spaces alike in conventional RL approaches. The proposed predictive coding biased competition using divisive input modulation (PC/BC-DIM) neural network learns prediction error-based flight control policy without physically actuating quadcopter propellers, which ensures its safety during training. The proposed network learned flight control policy without any physical flights, which reduced the training time to almost zero. The simulation results showed that the trained agent reached the destination accurately. For 20 quadcopter flight trails, the average path deviation from the ground truth was 1.495 and the root mean square (RMS) of the goal reached 1.708.","PeriodicalId":11748,"journal":{"name":"Engineering Proceedings","volume":"124 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/engproc2021012047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Quadcopters are finding their place in everything from transportation, delivery, hospitals, and to homes in almost every part of daily life. In places where human intervention for quadcopter flight control is impossible, it becomes necessary to equip drones with intelligent autopilot systems so that they can make decisions on their own. All previous reinforcement learning (RL)-based efforts for quadcopter flight control in complex, dynamic, and unstructured environments remained unsuccessful during the training phase in avoiding the trend of catastrophic failures by naturally unstable quadcopters. In this work, we propose a complementary approach for quadcopter flight control using prediction error as an effective control policy reward in the sensory space instead of rewards from unstable action spaces alike in conventional RL approaches. The proposed predictive coding biased competition using divisive input modulation (PC/BC-DIM) neural network learns prediction error-based flight control policy without physically actuating quadcopter propellers, which ensures its safety during training. The proposed network learned flight control policy without any physical flights, which reduced the training time to almost zero. The simulation results showed that the trained agent reached the destination accurately. For 20 quadcopter flight trails, the average path deviation from the ground truth was 1.495 and the root mean square (RMS) of the goal reached 1.708.

查看原文本刊更多论文

基于预测误差的四轴飞行器飞行控制动作策略学习

四轴飞行器在交通、快递、医院、家庭等日常生活的方方面面都占有一席之地。在无法对四轴飞行器的飞行控制进行人工干预的地方，有必要为无人机配备智能自动驾驶系统，使它们能够自己做出决定。在训练阶段，所有基于强化学习(RL)的四轴飞行器在复杂、动态和非结构化环境中的飞行控制在避免自然不稳定四轴飞行器的灾难性故障趋势方面的努力都是不成功的。在这项工作中，我们提出了一种四轴飞行器飞行控制的补充方法，使用预测误差作为感官空间中的有效控制策略奖励，而不是像传统的强化学习方法那样来自不稳定动作空间的奖励。提出的基于分裂输入调制(PC/BC-DIM)神经网络的预测编码偏差竞争算法在不实际驱动四轴飞行器螺旋桨的情况下学习基于预测误差的飞控策略，保证了四轴飞行器在训练过程中的安全性。该网络在没有任何实际飞行的情况下学习飞行控制策略，使训练时间几乎为零。仿真结果表明，训练后的智能体能够准确到达目的地。对于20条四轴飞行器的飞行轨迹，平均路径偏离地面真实值为1.495，目标的均方根(RMS)达到1.708。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Proceedings

CiteScore

0.70

自引率

0.00%

发文量