{"title":"采用强化学习算法控制空气弹簧","authors":"J. Rágulík, M. Sivčák","doi":"10.21495/5896-3-428","DOIUrl":null,"url":null,"abstract":": The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.","PeriodicalId":383836,"journal":{"name":"Engineering Mechanics 2020","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AIR SPRING CONTROLLED BY REINFORCEMENT LEARNING ALGORITHM\",\"authors\":\"J. Rágulík, M. Sivčák\",\"doi\":\"10.21495/5896-3-428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.\",\"PeriodicalId\":383836,\"journal\":{\"name\":\"Engineering Mechanics 2020\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Mechanics 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21495/5896-3-428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Mechanics 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21495/5896-3-428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AIR SPRING CONTROLLED BY REINFORCEMENT LEARNING ALGORITHM
: The paper deals with the replacement of the analogy PID stroke controller of a bellows pneumatic spring, by machine learning algorithms, specifically deep reinforcement learning. The Deep Deterministic Policy Gradient (DDPG) algorithm used consists of an environment, in this case a pneumatic spring, and an agent which, based on observations of environment, performs actions that lead to the cumulative reward it seeks to maximize. DDPG falls into the category of actor-critic algorithms. It combines the benefits of Q-learning and optimization of a deterministic strategy. Q-learning is represented here in the form of critic, while optimization of strategy is represented in the form of an actor that directly maps the state of the environment to actions. Both the critic and the actor are represented in deep reinforcement learning by deep neural networks. Both of these networks have a target variant of themselves. These target networks are designed to increase the stability and speed of the learning process. The DDPG algorithm also uses a replay buffer, from which the data from which the agent learns is taken in batches.