基于策略梯度的积分强化学习非仿射变形飞机系统最优控制设计

2020 28th Mediterranean Conference on Control and Automation (MED) Pub Date : 2020-09-01 DOI:10.1109/MED48518.2020.9183024

Hanna Lee, Seong-hun Kim, Youdan Kim

{"title":"基于策略梯度的积分强化学习非仿射变形飞机系统最优控制设计","authors":"Hanna Lee, Seong-hun Kim, Youdan Kim","doi":"10.1109/MED48518.2020.9183024","DOIUrl":null,"url":null,"abstract":"An online model-free optimal control design strategy is proposed for general continuous-time nonlinear nonaffine systems using policy gradient-based integral reinforcement learning. In the case of the nonaffine system such as a morphing wing aircraft considering the morphing parameters as control effectors, general nonlinear control design method cannot be applied and solving the Hamilton-Jacobi-Bellman equation analytically is difficult. The proposed online optimal control algorithm is constructed based on the actor-critic structure using Q-function and policy gradient scheme and the integral reinforcement learning approach is used to develop the actor-critic parameter estimation for the continuous-time system. The closed-loop stability analysis for the designed method is presented. Through the proposed method, the optimal controller can be designed for the general nonaffine system, which has an advantage in terms of a computational issue for the complex system. Note that the entire dynamic model is not required. Simulation results demonstrate the effectiveness of the proposed scheme.","PeriodicalId":418518,"journal":{"name":"2020 28th Mediterranean Conference on Control and Automation (MED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Policy Gradient-based Integral Reinforcement Learning for Optimal Control Design of Nonaffine Morphing Aircraft Systems\",\"authors\":\"Hanna Lee, Seong-hun Kim, Youdan Kim\",\"doi\":\"10.1109/MED48518.2020.9183024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An online model-free optimal control design strategy is proposed for general continuous-time nonlinear nonaffine systems using policy gradient-based integral reinforcement learning. In the case of the nonaffine system such as a morphing wing aircraft considering the morphing parameters as control effectors, general nonlinear control design method cannot be applied and solving the Hamilton-Jacobi-Bellman equation analytically is difficult. The proposed online optimal control algorithm is constructed based on the actor-critic structure using Q-function and policy gradient scheme and the integral reinforcement learning approach is used to develop the actor-critic parameter estimation for the continuous-time system. The closed-loop stability analysis for the designed method is presented. Through the proposed method, the optimal controller can be designed for the general nonaffine system, which has an advantage in terms of a computational issue for the complex system. Note that the entire dynamic model is not required. Simulation results demonstrate the effectiveness of the proposed scheme.\",\"PeriodicalId\":418518,\"journal\":{\"name\":\"2020 28th Mediterranean Conference on Control and Automation (MED)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th Mediterranean Conference on Control and Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED48518.2020.9183024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Mediterranean Conference on Control and Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED48518.2020.9183024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

针对一般连续时间非线性非仿射系统，提出了一种基于策略梯度的积分强化学习的在线无模型最优控制设计策略。对于以变形参数为控制效应器的变形翼飞机等非仿射系统，一般的非线性控制设计方法无法应用，且解析求解Hamilton-Jacobi-Bellman方程较为困难。采用q函数和策略梯度格式构建了基于角色-批评结构的在线最优控制算法，并采用积分强化学习方法对连续时间系统进行了角色-批评参数估计。对所设计的方法进行了闭环稳定性分析。该方法可用于一般非仿射系统的最优控制器设计，在解决复杂系统的计算问题方面具有优势。注意，不需要整个动态模型。仿真结果验证了该方案的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Policy Gradient-based Integral Reinforcement Learning for Optimal Control Design of Nonaffine Morphing Aircraft Systems

An online model-free optimal control design strategy is proposed for general continuous-time nonlinear nonaffine systems using policy gradient-based integral reinforcement learning. In the case of the nonaffine system such as a morphing wing aircraft considering the morphing parameters as control effectors, general nonlinear control design method cannot be applied and solving the Hamilton-Jacobi-Bellman equation analytically is difficult. The proposed online optimal control algorithm is constructed based on the actor-critic structure using Q-function and policy gradient scheme and the integral reinforcement learning approach is used to develop the actor-critic parameter estimation for the continuous-time system. The closed-loop stability analysis for the designed method is presented. Through the proposed method, the optimal controller can be designed for the general nonaffine system, which has an advantage in terms of a computational issue for the complex system. Note that the entire dynamic model is not required. Simulation results demonstrate the effectiveness of the proposed scheme.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 28th Mediterranean Conference on Control and Automation (MED)

自引率

0.00%

发文量