{"title":"求解倒立摆问题的最小二乘强化学习","authors":"Satta Panyakaew, Papangkorn Inkeaw, Jakramate Bootkrajang, Jeerayut Chaijaruwanich","doi":"10.1109/CCOMS.2018.8463234","DOIUrl":null,"url":null,"abstract":"Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.","PeriodicalId":405664,"journal":{"name":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Least Square Reinforcement Learning for Solving Inverted Pendulum Problem\",\"authors\":\"Satta Panyakaew, Papangkorn Inkeaw, Jakramate Bootkrajang, Jeerayut Chaijaruwanich\",\"doi\":\"10.1109/CCOMS.2018.8463234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.\",\"PeriodicalId\":405664,\"journal\":{\"name\":\"2018 3rd International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCOMS.2018.8463234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCOMS.2018.8463234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Least Square Reinforcement Learning for Solving Inverted Pendulum Problem
Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.