模型学习演员评价算法:运动控制任务中的性能评估

I. Grondman, L. Buşoniu, Robert Babuška
{"title":"模型学习演员评价算法:运动控制任务中的性能评估","authors":"I. Grondman, L. Buşoniu, Robert Babuška","doi":"10.1109/CDC.2012.6426427","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.","PeriodicalId":312426,"journal":{"name":"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Model learning actor-critic algorithms: Performance evaluation in a motion control task\",\"authors\":\"I. Grondman, L. Buşoniu, Robert Babuška\",\"doi\":\"10.1109/CDC.2012.6426427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.\",\"PeriodicalId\":312426,\"journal\":{\"name\":\"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2012.6426427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 51st IEEE Conference on Decision and Control (CDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2012.6426427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

摘要

强化学习控制提供了一种以最优方式处理与控制任务相关的不确定性和非线性的方法。这类行为批判强化学习算法被证明对具有连续状态和输入变量的控制系统是有用的。在文献中,最近引入了基于模型的演员评论算法,通过局部线性回归(LLR)在线构建模型,大大加快了学习速度。至于提速是由模型学习结构还是LLR逼近器引起的,目前还没有分析。因此,在本文中,我们推广了模型学习行为-评价算法,使其适用于任意函数逼近器。此外,我们还通过一个典型的非线性运动控制问题的数值模拟给出了广泛分析的结果。将LLR逼近器与径向基函数(rbf)在初始收敛速度和最终性能方面进行了比较。结果表明,基于llr的actor-critic RL优于RBF:它提供了快速的初始学习和相当甚至更好的最终控制性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Model learning actor-critic algorithms: Performance evaluation in a motion control task
Reinforcement learning (RL) control provides a means to deal with uncertainty and nonlinearity associated with control tasks in an optimal way. The class of actor-critic RL algorithms proved useful for control systems with continuous state and input variables. In the literature, model-based actor-critic algorithms have recently been introduced to considerably speed up the the learning by constructing online a model through local linear regression (LLR). It has not been analyzed yet whether the speed-up is due to the model learning structure or the LLR approximator. Therefore, in this paper we generalize the model learning actor-critic algorithms to make them suitable for use with an arbitrary function approximator. Furthermore, we present the results of an extensive analysis through numerical simulations of a typical nonlinear motion control problem. The LLR approximator is compared with radial basis functions (RBFs) in terms of the initial convergence rate and in terms of the final performance obtained. The results show that LLR-based actor-critic RL outperforms the RBF counterpart: it gives quick initial learning and comparable or even superior final control performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信