Self-training by Reinforcement Learning for Full-autonomous Drones of the Future*

2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC) Pub Date : 2018-09-01 DOI:10.1109/DASC.2018.8569503

Kjell Kersandt, G. Muñoz, C. Barrado

{"title":"Self-training by Reinforcement Learning for Full-autonomous Drones of the Future*","authors":"Kjell Kersandt, G. Muñoz, C. Barrado","doi":"10.1109/DASC.2018.8569503","DOIUrl":null,"url":null,"abstract":"Drones are rapidly increasing their activity in the airspace worldwide. This expected growth of the number of drones makes human-based traffic management prohibitive. Avionics systems able to sense-and-avoid obstacles and specially visual flight rules (VFR) traffic are under research. Moreover, to overcome loss-link contingencies, drones have to be able to act autonomously. In this paper we present a drone concept with a full level of autonomy based on Deep Reinforcement Learning (DRL). From the first flight until the accomplishment of its final mission, the drone has no need for a pilot. The only human intervention is the engineer programming the artificial intelligence algorithm used to train and then to control the drone. In this paper we present the preliminary results for an environment which is a realistic flight simulator, and an agent that is a quad-copter drone able to execute 3 actions. The inputs of the agent are the current state and the accumulated reward. Experiments include self-learning periods up to 3 days, followed by one hundred full-autonomous flight tests. Three different DRL algorithms were used to obtain the training models, based in Q-learning reinforcement learning. Results are very promising, with around an 80 percent of test flights reaching the target. In comparison with the results of a human pilot, acting in the same simulated environment and using the same three actions, the DRL methods demonstrated unequal results, depending on the learning algorithm used. We applied some enhancements in the training, with the creation of checkpoints of the training model every time a better solution is found. In a near future we expect to achieve results similar to the performance of a human pilot to support the idea of full-autonomous drones through DRL methods.","PeriodicalId":405724,"journal":{"name":"2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASC.2018.8569503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Drones are rapidly increasing their activity in the airspace worldwide. This expected growth of the number of drones makes human-based traffic management prohibitive. Avionics systems able to sense-and-avoid obstacles and specially visual flight rules (VFR) traffic are under research. Moreover, to overcome loss-link contingencies, drones have to be able to act autonomously. In this paper we present a drone concept with a full level of autonomy based on Deep Reinforcement Learning (DRL). From the first flight until the accomplishment of its final mission, the drone has no need for a pilot. The only human intervention is the engineer programming the artificial intelligence algorithm used to train and then to control the drone. In this paper we present the preliminary results for an environment which is a realistic flight simulator, and an agent that is a quad-copter drone able to execute 3 actions. The inputs of the agent are the current state and the accumulated reward. Experiments include self-learning periods up to 3 days, followed by one hundred full-autonomous flight tests. Three different DRL algorithms were used to obtain the training models, based in Q-learning reinforcement learning. Results are very promising, with around an 80 percent of test flights reaching the target. In comparison with the results of a human pilot, acting in the same simulated environment and using the same three actions, the DRL methods demonstrated unequal results, depending on the learning algorithm used. We applied some enhancements in the training, with the creation of checkpoints of the training model every time a better solution is found. In a near future we expect to achieve results similar to the performance of a human pilot to support the idea of full-autonomous drones through DRL methods.

查看原文本刊更多论文

未来全自动无人机的强化学习自我训练*

无人机在全球空域的活动正在迅速增加。无人机数量的预期增长使得以人为基础的交通管理变得令人望而却步。能够感知和避开障碍物，特别是视觉飞行规则(VFR)交通的航空电子系统正在研究中。此外，为了克服失去链接的意外情况，无人机必须能够自主行动。在本文中，我们提出了一种基于深度强化学习(DRL)的全自动无人机概念。从第一次飞行到完成最后的任务，无人机不需要飞行员。唯一的人为干预是工程师编写用于训练和控制无人机的人工智能算法。在本文中，我们给出了一个环境是一个真实的飞行模拟器，一个代理是一个四旋翼无人机能够执行3个动作的初步结果。agent的输入是当前状态和累积奖励。实验包括长达3天的自主学习期，随后进行100次全自动飞行测试。在Q-learning强化学习的基础上，采用三种不同的DRL算法获得训练模型。结果非常有希望，大约80%的试飞达到了目标。与人类飞行员的结果相比，在相同的模拟环境中，使用相同的三个动作，DRL方法显示出不同的结果，这取决于所使用的学习算法。我们在训练中应用了一些增强，每次找到更好的解决方案时都会创建训练模型的检查点。在不久的将来，我们希望通过DRL方法实现与人类飞行员类似的结果，以支持全自动无人机的想法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC)

自引率

0.00%

发文量