模型学习演员评价算法:运动控制任务中的性能评估

2012 IEEE 51st IEEE Conference on Decision and Control (CDC) Pub Date : 2012-12-10 DOI:10.1109/CDC.2012.6426427

I. Grondman, L. Buşoniu, Robert Babuška

{"title":"模型学习演员评价算法:运动控制任务中的性能评估","authors":"I. Grondman, L. Buşoniu, Robert Babuška","doi":"10.1109/CDC.2012.6426427","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.","PeriodicalId":312426,"journal":{"name":"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Model learning actor-critic algorithms: Performance evaluation in a motion control task\",\"authors\":\"I. Grondman, L. Buşoniu, Robert Babuška\",\"doi\":\"10.1109/CDC.2012.6426427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.\",\"PeriodicalId\":312426,\"journal\":{\"name\":\"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2012.6426427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2012.6426427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

强化学习控制提供了一种以最优方式处理与控制任务相关的不确定性和非线性的方法。这类行为批判强化学习算法被证明对具有连续状态和输入变量的控制系统是有用的。在文献中，最近引入了基于模型的演员评论算法，通过局部线性回归(LLR)在线构建模型，大大加快了学习速度。至于提速是由模型学习结构还是LLR逼近器引起的，目前还没有分析。因此，在本文中，我们推广了模型学习行为-评价算法，使其适用于任意函数逼近器。此外，我们还通过一个典型的非线性运动控制问题的数值模拟给出了广泛分析的结果。将LLR逼近器与径向基函数(rbf)在初始收敛速度和最终性能方面进行了比较。结果表明，基于llr的actor-critic RL优于RBF:它提供了快速的初始学习和相当甚至更好的最终控制性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model learning actor-critic algorithms: Performance evaluation in a motion control task

Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 51st IEEE Conference on Decision and Control (CDC)

自引率

0.00%

发文量