求解倒立摆问题的最小二乘强化学习

Satta Panyakaew, Papangkorn Inkeaw, Jakramate Bootkrajang, Jeerayut Chaijaruwanich
{"title":"求解倒立摆问题的最小二乘强化学习","authors":"Satta Panyakaew, Papangkorn Inkeaw, Jakramate Bootkrajang, Jeerayut Chaijaruwanich","doi":"10.1109/CCOMS.2018.8463234","DOIUrl":null,"url":null,"abstract":"Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.","PeriodicalId":405664,"journal":{"name":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Least Square Reinforcement Learning for Solving Inverted Pendulum Problem\",\"authors\":\"Satta Panyakaew, Papangkorn Inkeaw, Jakramate Bootkrajang, Jeerayut Chaijaruwanich\",\"doi\":\"10.1109/CCOMS.2018.8463234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.\",\"PeriodicalId\":405664,\"journal\":{\"name\":\"2018 3rd International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCOMS.2018.8463234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCOMS.2018.8463234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

倒立摆是可以用强化学习方法解决的经典控制问题之一。以往的研究大多是在离散状态空间中考虑问题,只有少数例外是假设连续状态域。研究了具有约束轨迹长度的连续状态空间下的小车-杆平衡问题。我们采用最小二乘时间差分强化学习算法来学习控制器。然后提出一个新的奖励函数来更好地反映任务的性质。此外,我们还研究了对学习成功起重要作用的各种因素。实证研究验证了本文方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Least Square Reinforcement Learning for Solving Inverted Pendulum Problem
Inverted pendulum is one of the classic control problem that could be solved by reinforcement learning approach. Most of the previous work consider the problem in discrete state space with only few exceptions assume continuous state domain. In this paper, we consider the problem of cart-pole balancing in the continuous state space setup with constrained track length. We adopted a least square temporal difference reinforcement learning algorithm for learning the controller. A new reward function is then proposed to better reflect the nature of the task. In addition, we also studied various factors which play important roles in the success of the learning. The empirical studies validate the effectiveness of our method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信