采样数据控制通过无模型强化学习与有效的经验回放

Journal of Automation and Intelligence Pub Date : 2023-02-01 DOI:10.1016/j.jai.2023.100018

Bo Xiao , H.K. Lam , Xiaojie Su , Ziwei Wang , Frank P.-W. Lo , Shihong Chen , Eric Yeatman

{"title":"采样数据控制通过无模型强化学习与有效的经验回放","authors":"Bo Xiao , H.K. Lam , Xiaojie Su , Ziwei Wang , Frank P.-W. Lo , Shihong Chen , Eric Yeatman","doi":"10.1016/j.jai.2023.100018","DOIUrl":null,"url":null,"abstract":"<div><p>Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.</p><p>The effectiveness of proposed approach will be verified by simulation examples.</p></div>","PeriodicalId":100755,"journal":{"name":"Journal of Automation and Intelligence","volume":"2 1","pages":"Pages 20-30"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Sampled-data control through model-free reinforcement learning with effective experience replay\",\"authors\":\"Bo Xiao , H.K. Lam , Xiaojie Su , Ziwei Wang , Frank P.-W. Lo , Shihong Chen , Eric Yeatman\",\"doi\":\"10.1016/j.jai.2023.100018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.</p><p>The effectiveness of proposed approach will be verified by simulation examples.</p></div>\",\"PeriodicalId\":100755,\"journal\":{\"name\":\"Journal of Automation and Intelligence\",\"volume\":\"2 1\",\"pages\":\"Pages 20-30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Automation and Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949855423000011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949855423000011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

基于强化学习（RL）的控制算法可以在与非线性和不确定环境交互的过程中学习其控制策略。在环境产生的回报的引导下，RL代理可以直接以无模型的方式学习控制策略，而不是研究环境的动态模型。在本文中，我们提出了采样数据RL控制策略，以减少计算需求。在采样数据控制策略中，整个控制系统是一个混合结构，其中对象是连续结构，而控制器（RL-agent）是离散结构。假设对象的连续状态将是agent的输入，状态-动作值函数由全连接前馈神经网络（FCFFNN）近似。与环境交互过程中的每一步都不学习控制器，而是将学习和行动阶段解耦，通过经验回放更有效地学习控制策略。在表演阶段，将存储在与环境互动过程中获得的最有效的体验，在学习阶段，将所存储的体验回放到定制的时间，这有助于增强体验回放过程。仿真实例验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sampled-data control through model-free reinforcement learning with effective experience replay

Reinforcement Learning (RL) based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it. Guided by the rewards generated by environment, a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment. In the paper, we propose the sampled-data RL control strategy to reduce the computational demand. In the sampled-data control strategy, the whole control system is of a hybrid structure, in which the plant is of continuous structure while the controller (RL agent) adopts a discrete structure. Given that the continuous states of the plant will be the input of the agent, the state–action value function is approximated by the fully connected feed-forward neural networks (FCFFNN). Instead of learning the controller at every step during the interaction with the environment, the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay. In the acting stage, the most effective experience obtained during the interaction with the environment will be stored and during the learning stage, the stored experience will be replayed to customized times, which helps enhance the experience replay process.

The effectiveness of proposed approach will be verified by simulation examples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Automation and Intelligence

自引率

0.00%

发文量