{"title":"输出反馈强化q学习在未知离散线性系统最优二次跟踪控制中的应用","authors":"Guangyue Zhao, Weijie Sun, He Cai, Yunjian Peng","doi":"10.1109/ICARCV.2018.8581252","DOIUrl":null,"url":null,"abstract":"In this paper, a novel output feedback solution based on the Q-learning algorithm using the measured data is proposed for the linear quadratic tracking (LQT) problem of unknown discrete-time systems. To tackle this technical issue, an augmented system composed of the original controlled system and the linear command generator is first constructed. Then, by using the past input, output, and reference trajectory data of the augmented system, the output feedback Q-learning scheme is able to learn the optimal tracking controller online without requiring any knowledge of the augmented system dynamics. Learning algorithms including both policy iteration (PI) and value iteration (VI) algorithms are developed to converge to the optimal solution. Finally, simulation results are provided to verify the effectiveness of the proposed scheme.","PeriodicalId":395380,"journal":{"name":"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Output Feedback Reinforcement Q-learning for Optimal Quadratic Tracking Control of Unknown Discrete-Time Linear Systems and Its Application\",\"authors\":\"Guangyue Zhao, Weijie Sun, He Cai, Yunjian Peng\",\"doi\":\"10.1109/ICARCV.2018.8581252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a novel output feedback solution based on the Q-learning algorithm using the measured data is proposed for the linear quadratic tracking (LQT) problem of unknown discrete-time systems. To tackle this technical issue, an augmented system composed of the original controlled system and the linear command generator is first constructed. Then, by using the past input, output, and reference trajectory data of the augmented system, the output feedback Q-learning scheme is able to learn the optimal tracking controller online without requiring any knowledge of the augmented system dynamics. Learning algorithms including both policy iteration (PI) and value iteration (VI) algorithms are developed to converge to the optimal solution. Finally, simulation results are provided to verify the effectiveness of the proposed scheme.\",\"PeriodicalId\":395380,\"journal\":{\"name\":\"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCV.2018.8581252\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCV.2018.8581252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Output Feedback Reinforcement Q-learning for Optimal Quadratic Tracking Control of Unknown Discrete-Time Linear Systems and Its Application
In this paper, a novel output feedback solution based on the Q-learning algorithm using the measured data is proposed for the linear quadratic tracking (LQT) problem of unknown discrete-time systems. To tackle this technical issue, an augmented system composed of the original controlled system and the linear command generator is first constructed. Then, by using the past input, output, and reference trajectory data of the augmented system, the output feedback Q-learning scheme is able to learn the optimal tracking controller online without requiring any knowledge of the augmented system dynamics. Learning algorithms including both policy iteration (PI) and value iteration (VI) algorithms are developed to converge to the optimal solution. Finally, simulation results are provided to verify the effectiveness of the proposed scheme.