{"title":"Online Hybrid Learning to Speed Up Deep Reinforcement Learning Method for Commercial Aircraft Control","authors":"Minjian Xin, Yue Gao, Tianhao Mou, Jianlong Ye","doi":"10.1109/ISASS.2019.8757756","DOIUrl":null,"url":null,"abstract":"We propose an online hybrid learning algorithm that enables deep reinforcement learning agents to learn in environments where the cost of exploration is expensive. Our algorithm adopts ideas from imitation learning and Deep Deterministic Policy Gradient (DDPG). It utilizes an existing baseline controller to speed up the process of learning as well as lower the exploration cost. Our algorithm is validated on classic pendulum swing-up problem and shows faster convergence speed and lower exploration cost. Furthermore, the algorithm can also be applied in learning a controller for commercial aircraft cruising. While DDPG fails to learn a decent policy, our hybrid learning algorithm is able to learn quickly in an online manner with low cost. Our experiments show that the learned policy network is more robust than the baseline PID controller.","PeriodicalId":359959,"journal":{"name":"2019 3rd International Symposium on Autonomous Systems (ISAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Symposium on Autonomous Systems (ISAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISASS.2019.8757756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online Hybrid Learning to Speed Up Deep Reinforcement Learning Method for Commercial Aircraft Control
We propose an online hybrid learning algorithm that enables deep reinforcement learning agents to learn in environments where the cost of exploration is expensive. Our algorithm adopts ideas from imitation learning and Deep Deterministic Policy Gradient (DDPG). It utilizes an existing baseline controller to speed up the process of learning as well as lower the exploration cost. Our algorithm is validated on classic pendulum swing-up problem and shows faster convergence speed and lower exploration cost. Furthermore, the algorithm can also be applied in learning a controller for commercial aircraft cruising. While DDPG fails to learn a decent policy, our hybrid learning algorithm is able to learn quickly in an online manner with low cost. Our experiments show that the learned policy network is more robust than the baseline PID controller.