基于q -学习算法的离散时间最优控制方案

2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP) Pub Date : 2016-12-01 DOI:10.1109/ICICIP.2016.7885888

Qinglai Wei, Derong Liu, Ruizhuo Song

{"title":"基于q -学习算法的离散时间最优控制方案","authors":"Qinglai Wei, Derong Liu, Ruizhuo Song","doi":"10.1109/ICICIP.2016.7885888","DOIUrl":null,"url":null,"abstract":"This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.","PeriodicalId":226381,"journal":{"name":"2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discrete-time optimal control scheme based on Q-learning algorithm\",\"authors\":\"Qinglai Wei, Derong Liu, Ruizhuo Song\",\"doi\":\"10.1109/ICICIP.2016.7885888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.\",\"PeriodicalId\":226381,\"journal\":{\"name\":\"2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIP.2016.7885888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP.2016.7885888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文利用一种新的q -学习算法研究离散非线性系统的最优控制问题。在新开发的Q-learning算法中，每次迭代中的迭代Q函数需要在整个状态空间和控制空间上更新，而不是由单个状态和控制对更新。提出了相应的q -学习算法的一个新的收敛准则，放宽了传统的q -学习算法的学习率约束。最后给出了仿真结果，验证了该算法的良好性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discrete-time optimal control scheme based on Q-learning algorithm

This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP)

自引率

0.00%

发文量