采用强化学习算法控制空气弹簧

Engineering Mechanics 2020 Pub Date : 1900-01-01 DOI:10.21495/5896-3-428

J. Rágulík, M. Sivčák

{"title":"采用强化学习算法控制空气弹簧","authors":"J. Rágulík, M. Sivčák","doi":"10.21495/5896-3-428","DOIUrl":null,"url":null,"abstract":": The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.","PeriodicalId":383836,"journal":{"name":"Engineering Mechanics 2020","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AIR SPRING CONTROLLED BY REINFORCEMENT LEARNING ALGORITHM\",\"authors\":\"J. Rágulík, M. Sivčák\",\"doi\":\"10.21495/5896-3-428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.\",\"PeriodicalId\":383836,\"journal\":{\"name\":\"Engineering Mechanics 2020\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Mechanics 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21495/5896-3-428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Mechanics 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21495/5896-3-428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了用机器学习算法，特别是深度强化学习来替代风箱气动弹簧的类比PID行程控制器。深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)算法由一个环境(在本例中是一个气动弹簧)和一个基于对环境的观察，执行导致其寻求最大化累积奖励的行为的代理组成。DDPG属于演员评论算法的范畴。它结合了Q-learning的优点和确定性策略的优化。q学习在这里以批评家的形式表示，而策略优化则以直接将环境状态映射到行动的行动者的形式表示。在深度强化学习中，评论家和演员都用深度神经网络来表示。这两种网络都有自己的目标变体。这些目标网络旨在提高学习过程的稳定性和速度。DDPG算法还使用了一个重放缓冲区，代理从中批量获取学习数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AIR SPRING CONTROLLED BY REINFORCEMENT LEARNING ALGORITHM

: The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Mechanics 2020

自引率

0.00%

发文量