通过设计特殊的基函数对环境的对称性进行积分,实现强化学习中的值函数逼近

Guo-fang Wang, Zhou Fang, Bo Li, Ping Li
{"title":"通过设计特殊的基函数对环境的对称性进行积分,实现强化学习中的值函数逼近","authors":"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li","doi":"10.1109/ICARCV.2016.7838691","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.","PeriodicalId":128828,"journal":{"name":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning\",\"authors\":\"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li\",\"doi\":\"10.1109/ICARCV.2016.7838691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.\",\"PeriodicalId\":128828,\"journal\":{\"name\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCV.2016.7838691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCV.2016.7838691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

强化学习(RL)通常被认为是一种表格学习,智能体需要随机探索环境,因此耗时和数据效率低下会阻碍RL的实际应用。为了加快学习速度和提高数据效率,本文将对称性的定义从有限状态空间扩展到无限状态空间,并提出设计一种特殊类型的对称基函数用于值函数逼近,以便对大甚至无限状态空间的环境的对称性先验知识进行整合。然后,作为一个例子,将这种特殊的近似结构纳入最小二乘策略迭代(LSPI)的策略评估阶段,我们称之为对称LSPI (S-LSPI),并分析了收敛性。链走和倒立摆平衡的仿真结果表明,与常规LSPI (R-LSPI)相比,S-LSPI的收敛速度大大提高,计算量也显著减少。它可以很好地说明使用对称基函数来捕获对称性的性质,并作为一个案例研究,它显示了将环境的对称性集成到RL代理中的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning
Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信