Nonlinear Disturbance Compensation and Reference Tracking via Reinforcement Learning with Fuzzy Approximators

Y. Bayiz, Robert Babuška
{"title":"Nonlinear Disturbance Compensation and Reference Tracking via Reinforcement Learning with Fuzzy Approximators","authors":"Y. Bayiz, Robert Babuška","doi":"10.3182/20140824-6-ZA-1003.02511","DOIUrl":null,"url":null,"abstract":"Abstract Reinforcement Learning (RL) algorithms can learn optimal control laws for nonlinear dynamic systems without relying on a mathematical model of the system to be controlled. While RL can in principle discover control laws from scratch, by solely interacting with the process, in practice this does not yield any significant advantages. Learning control laws from scratch is lengthy and may lead to system damage due to the trial and error nature of the learning process. In this paper, we adopt a different and largely unexplored approach: a nominal control law is used to achieve reasonable, yet suboptimal, performance and a RL agent is trained to act as a nonlinear compensator whose task is to improve upon the performance of the nominal controller. The RL agent learns by means of an actor-critic algorithm using a plant model acquired on-line, alongside the critic and actor. Fuzzy approximators are employed to represent all the adjustable components of the learning scheme. One advantage of fuzzy approximators is the straightforward way in which they allow for the inclusion of prior knowledge. The proposed control scheme is applied to a reference tracking problem of 1-DOF robot arm influenced by an unknown payload disturbance due to gravity. The nominal controller is a PD controller, which is unable to properly compensate the effect of the disturbance considered. Simulation results indicate that the novel method is able to learn to compensate the disturbance for any reference angle varying throughout the experiment.","PeriodicalId":13260,"journal":{"name":"IFAC Proceedings Volumes","volume":"32 1","pages":"5393-5398"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Proceedings Volumes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3182/20140824-6-ZA-1003.02511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Abstract Reinforcement Learning (RL) algorithms can learn optimal control laws for nonlinear dynamic systems without relying on a mathematical model of the system to be controlled. While RL can in principle discover control laws from scratch, by solely interacting with the process, in practice this does not yield any significant advantages. Learning control laws from scratch is lengthy and may lead to system damage due to the trial and error nature of the learning process. In this paper, we adopt a different and largely unexplored approach: a nominal control law is used to achieve reasonable, yet suboptimal, performance and a RL agent is trained to act as a nonlinear compensator whose task is to improve upon the performance of the nominal controller. The RL agent learns by means of an actor-critic algorithm using a plant model acquired on-line, alongside the critic and actor. Fuzzy approximators are employed to represent all the adjustable components of the learning scheme. One advantage of fuzzy approximators is the straightforward way in which they allow for the inclusion of prior knowledge. The proposed control scheme is applied to a reference tracking problem of 1-DOF robot arm influenced by an unknown payload disturbance due to gravity. The nominal controller is a PD controller, which is unable to properly compensate the effect of the disturbance considered. Simulation results indicate that the novel method is able to learn to compensate the disturbance for any reference angle varying throughout the experiment.
摘要强化学习(RL)算法可以学习非线性动态系统的最优控制规律,而不依赖于被控系统的数学模型。虽然RL原则上可以从头开始发现控制规律,但仅仅通过与过程交互,实际上这并没有产生任何显著的优势。从零开始学习控制律是冗长的,并且由于学习过程的试错性质,可能导致系统损坏。在本文中,我们采用了一种不同的和很大程度上未被探索的方法:使用标称控制律来实现合理但次优的性能,并训练RL代理作为非线性补偿器,其任务是改进标称控制器的性能。RL代理使用在线获得的植物模型,与评论家和演员一起,通过演员-评论家算法进行学习。采用模糊逼近器来表示学习方案的所有可调分量。模糊逼近器的一个优点是它们允许包含先验知识的直接方式。将所提出的控制方案应用于受未知重力载荷干扰影响的1自由度机械臂参考跟踪问题。标称控制器是一种PD控制器,它不能适当地补偿所考虑的干扰的影响。仿真结果表明,该方法能够对实验过程中任意参考角的扰动进行补偿。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信