Shuai Ye, Yingjiang Zhou, Guoping Jiang, Qiong Lin
{"title":"Optimization control of UAVs based on self-learning adaptive dynamic programming","authors":"Shuai Ye, Yingjiang Zhou, Guoping Jiang, Qiong Lin","doi":"10.1109/YAC51587.2020.9337696","DOIUrl":null,"url":null,"abstract":"In UAVs, optimal control has attracted more and more attention. In this paper, a self-learning adaptive dynamic programming (ADP) architecture based reinforcement learning (RL) is proposed to obtain optimal control for UAVs. 1 Compared with the traditional ADP architecture including two networks, one is used to make policy, and the other is used to evaluate policy, We propose to add a third network to replace external reward signals, that is, the agent can acquire reward signals by itself and do not need to interact with the environment. The proposed self-learning ADP method can improve the control performance by online learning while ensuring the state of the system stable at the equilibrium point. Finally, the proposed control algorithm is applied to quadrotor UAVs, and the experimental results show that the effectiveness of the algorithm.","PeriodicalId":287095,"journal":{"name":"2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/YAC51587.2020.9337696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In UAVs, optimal control has attracted more and more attention. In this paper, a self-learning adaptive dynamic programming (ADP) architecture based reinforcement learning (RL) is proposed to obtain optimal control for UAVs. 1 Compared with the traditional ADP architecture including two networks, one is used to make policy, and the other is used to evaluate policy, We propose to add a third network to replace external reward signals, that is, the agent can acquire reward signals by itself and do not need to interact with the environment. The proposed self-learning ADP method can improve the control performance by online learning while ensuring the state of the system stable at the equilibrium point. Finally, the proposed control algorithm is applied to quadrotor UAVs, and the experimental results show that the effectiveness of the algorithm.