Yongwei Zhang, Shunchao Zhang, Bo Zhao, Derong Liu
{"title":"基于策略梯度自适应学习算法的时滞系统无模型控制","authors":"Yongwei Zhang, Shunchao Zhang, Bo Zhao, Derong Liu","doi":"10.1109/IAI50351.2020.9262213","DOIUrl":null,"url":null,"abstract":"This paper develops a model-free optimal control scheme for discrete-time nonlinear systems with time-delays by using the policy gradient based adaptive learning (PGAL) algorithm. By using the measured data, the PGAL algorithm is employed to design an optimal controller for discrete-time systems. Compared with the traditional adaptive dynamic programming algorithms, the proposed method is a data-based one and improves the control input with policy gradient. The convergence of the PGAL algorithm is proved by demonstrating that the value function converges to optimum. To implement the PGAL algorithm, an actor-critic framework is constructed to learn the optimal control law and the value function. Finally, a simulation example is presented to demonstrate the effectiveness of the developed method.","PeriodicalId":137183,"journal":{"name":"2020 2nd International Conference on Industrial Artificial Intelligence (IAI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model-Free Control of Time-Delay Systems via Policy Gradient Based Adaptive Learning Algorithm\",\"authors\":\"Yongwei Zhang, Shunchao Zhang, Bo Zhao, Derong Liu\",\"doi\":\"10.1109/IAI50351.2020.9262213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper develops a model-free optimal control scheme for discrete-time nonlinear systems with time-delays by using the policy gradient based adaptive learning (PGAL) algorithm. By using the measured data, the PGAL algorithm is employed to design an optimal controller for discrete-time systems. Compared with the traditional adaptive dynamic programming algorithms, the proposed method is a data-based one and improves the control input with policy gradient. The convergence of the PGAL algorithm is proved by demonstrating that the value function converges to optimum. To implement the PGAL algorithm, an actor-critic framework is constructed to learn the optimal control law and the value function. Finally, a simulation example is presented to demonstrate the effectiveness of the developed method.\",\"PeriodicalId\":137183,\"journal\":{\"name\":\"2020 2nd International Conference on Industrial Artificial Intelligence (IAI)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Industrial Artificial Intelligence (IAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAI50351.2020.9262213\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Industrial Artificial Intelligence (IAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAI50351.2020.9262213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Model-Free Control of Time-Delay Systems via Policy Gradient Based Adaptive Learning Algorithm
This paper develops a model-free optimal control scheme for discrete-time nonlinear systems with time-delays by using the policy gradient based adaptive learning (PGAL) algorithm. By using the measured data, the PGAL algorithm is employed to design an optimal controller for discrete-time systems. Compared with the traditional adaptive dynamic programming algorithms, the proposed method is a data-based one and improves the control input with policy gradient. The convergence of the PGAL algorithm is proved by demonstrating that the value function converges to optimum. To implement the PGAL algorithm, an actor-critic framework is constructed to learn the optimal control law and the value function. Finally, a simulation example is presented to demonstrate the effectiveness of the developed method.