采样数据控制通过无模型强化学习与有效的经验回放

Bo Xiao , H.K. Lam , Xiaojie Su , Ziwei Wang , Frank P.-W. Lo , Shihong Chen , Eric Yeatman
{"title":"采样数据控制通过无模型强化学习与有效的经验回放","authors":"Bo Xiao ,&nbsp;H.K. Lam ,&nbsp;Xiaojie Su ,&nbsp;Ziwei Wang ,&nbsp;Frank P.-W. Lo ,&nbsp;Shihong Chen ,&nbsp;Eric Yeatman","doi":"10.1016/j.jai.2023.100018","DOIUrl":null,"url":null,"abstract":"<div><p>Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.</p><p>The effectiveness of proposed approach will be verified by simulation examples.</p></div>","PeriodicalId":100755,"journal":{"name":"Journal of Automation and Intelligence","volume":"2 1","pages":"Pages 20-30"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Sampled-data control through model-free reinforcement learning with effective experience replay\",\"authors\":\"Bo Xiao ,&nbsp;H.K. Lam ,&nbsp;Xiaojie Su ,&nbsp;Ziwei Wang ,&nbsp;Frank P.-W. Lo ,&nbsp;Shihong Chen ,&nbsp;Eric Yeatman\",\"doi\":\"10.1016/j.jai.2023.100018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.</p><p>The effectiveness of proposed approach will be verified by simulation examples.</p></div>\",\"PeriodicalId\":100755,\"journal\":{\"name\":\"Journal of Automation and Intelligence\",\"volume\":\"2 1\",\"pages\":\"Pages 20-30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Automation and Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949855423000011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949855423000011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

基于强化学习(RL)的控制算法可以在与非线性和不确定环境交互的过程中学习其控制策略。在环境产生的回报的引导下,RL代理可以直接以无模型的方式学习控制策略,而不是研究环境的动态模型。在本文中,我们提出了采样数据RL控制策略,以减少计算需求。在采样数据控制策略中,整个控制系统是一个混合结构,其中对象是连续结构,而控制器(RL-agent)是离散结构。假设对象的连续状态将是agent的输入,状态-动作值函数由全连接前馈神经网络(FCFFNN)近似。与环境交互过程中的每一步都不学习控制器,而是将学习和行动阶段解耦,通过经验回放更有效地学习控制策略。在表演阶段,将存储在与环境互动过程中获得的最有效的体验,在学习阶段,将所存储的体验回放到定制的时间,这有助于增强体验回放过程。仿真实例验证了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sampled-data control through model-free reinforcement learning with effective experience replay

Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.

The effectiveness of proposed approach will be verified by simulation examples.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信