{"title":"非线性系统最优控制的在线临界-辨识器-参与者算法","authors":"H. Lin, Qinglai Wei, Derong Liu","doi":"10.1109/ICICIP.2015.7388204","DOIUrl":null,"url":null,"abstract":"In this paper, a novel critic-identifier-actor optimal control scheme is designed for discrete-time affine nonlinear systems with uncertainties. A neural identifier is established to learn the unknown control coefficient matrix for affine nonlinear system working together with an actor-critic-based scheme to solve the optimal control in online and forward-in-time manner without value or policy iterations. A critic network learns approximate value function at each step. Another actor network attempts to improve the current policy based on the approximate value function. The weights of all neural networks (NNs) are updated at each sampling instant. Lyapunov theory is utilized to prove the stability of the closed-loop system. A simulation example is provided to illustrate the effectiveness of the developed method.","PeriodicalId":265426,"journal":{"name":"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Online critic-identifier-actor algorithm for optimal control of nonlinear systems\",\"authors\":\"H. Lin, Qinglai Wei, Derong Liu\",\"doi\":\"10.1109/ICICIP.2015.7388204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a novel critic-identifier-actor optimal control scheme is designed for discrete-time affine nonlinear systems with uncertainties. A neural identifier is established to learn the unknown control coefficient matrix for affine nonlinear system working together with an actor-critic-based scheme to solve the optimal control in online and forward-in-time manner without value or policy iterations. A critic network learns approximate value function at each step. Another actor network attempts to improve the current policy based on the approximate value function. The weights of all neural networks (NNs) are updated at each sampling instant. Lyapunov theory is utilized to prove the stability of the closed-loop system. A simulation example is provided to illustrate the effectiveness of the developed method.\",\"PeriodicalId\":265426,\"journal\":{\"name\":\"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIP.2015.7388204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP.2015.7388204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online critic-identifier-actor algorithm for optimal control of nonlinear systems
In this paper, a novel critic-identifier-actor optimal control scheme is designed for discrete-time affine nonlinear systems with uncertainties. A neural identifier is established to learn the unknown control coefficient matrix for affine nonlinear system working together with an actor-critic-based scheme to solve the optimal control in online and forward-in-time manner without value or policy iterations. A critic network learns approximate value function at each step. Another actor network attempts to improve the current policy based on the approximate value function. The weights of all neural networks (NNs) are updated at each sampling instant. Lyapunov theory is utilized to prove the stability of the closed-loop system. A simulation example is provided to illustrate the effectiveness of the developed method.