{"title":"推车杆倒立摆系统的强化学习","authors":"A. Surriani, O. Wahyunggoro, A. Cahyadi","doi":"10.1109/IEACon51066.2021.9654440","DOIUrl":null,"url":null,"abstract":"Recently, reinforcement learning considered to be the chosen method to solve many problems. One of the challenging problems is controlling dynamic behaviour systems. This paper used policy gradient to balance cart pole inverted pendulum. The purpose of this paper is to balance the pole upright with the movement of the cart. The paper employed two main policy gradient-based algorithms. The results show that PG using baseline has faster episodes than reinforce PG in the training process, reinforce PG algorithm got higher accumulative reward value than PG using baseline.","PeriodicalId":397039,"journal":{"name":"2021 IEEE Industrial Electronics and Applications Conference (IEACon)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Reinforcement Learning for Cart Pole Inverted Pendulum System\",\"authors\":\"A. Surriani, O. Wahyunggoro, A. Cahyadi\",\"doi\":\"10.1109/IEACon51066.2021.9654440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, reinforcement learning considered to be the chosen method to solve many problems. One of the challenging problems is controlling dynamic behaviour systems. This paper used policy gradient to balance cart pole inverted pendulum. The purpose of this paper is to balance the pole upright with the movement of the cart. The paper employed two main policy gradient-based algorithms. The results show that PG using baseline has faster episodes than reinforce PG in the training process, reinforce PG algorithm got higher accumulative reward value than PG using baseline.\",\"PeriodicalId\":397039,\"journal\":{\"name\":\"2021 IEEE Industrial Electronics and Applications Conference (IEACon)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Industrial Electronics and Applications Conference (IEACon)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEACon51066.2021.9654440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Industrial Electronics and Applications Conference (IEACon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEACon51066.2021.9654440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning for Cart Pole Inverted Pendulum System
Recently, reinforcement learning considered to be the chosen method to solve many problems. One of the challenging problems is controlling dynamic behaviour systems. This paper used policy gradient to balance cart pole inverted pendulum. The purpose of this paper is to balance the pole upright with the movement of the cart. The paper employed two main policy gradient-based algorithms. The results show that PG using baseline has faster episodes than reinforce PG in the training process, reinforce PG algorithm got higher accumulative reward value than PG using baseline.