通过设计特殊的基函数对环境的对称性进行积分，实现强化学习中的值函数逼近

2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) Pub Date : 2016-11-01 DOI:10.1109/ICARCV.2016.7838691

Guo-fang Wang, Zhou Fang, Bo Li, Ping Li

{"title":"通过设计特殊的基函数对环境的对称性进行积分，实现强化学习中的值函数逼近","authors":"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li","doi":"10.1109/ICARCV.2016.7838691","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.","PeriodicalId":128828,"journal":{"name":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning\",\"authors\":\"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li\",\"doi\":\"10.1109/ICARCV.2016.7838691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.\",\"PeriodicalId\":128828,\"journal\":{\"name\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCV.2016.7838691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCV.2016.7838691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

强化学习(RL)通常被认为是一种表格学习，智能体需要随机探索环境，因此耗时和数据效率低下会阻碍RL的实际应用。为了加快学习速度和提高数据效率，本文将对称性的定义从有限状态空间扩展到无限状态空间，并提出设计一种特殊类型的对称基函数用于值函数逼近，以便对大甚至无限状态空间的环境的对称性先验知识进行整合。然后，作为一个例子，将这种特殊的近似结构纳入最小二乘策略迭代(LSPI)的策略评估阶段，我们称之为对称LSPI (S-LSPI)，并分析了收敛性。链走和倒立摆平衡的仿真结果表明，与常规LSPI (R-LSPI)相比，S-LSPI的收敛速度大大提高，计算量也显著减少。它可以很好地说明使用对称基函数来捕获对称性的性质，并作为一个案例研究，它显示了将环境的对称性集成到RL代理中的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning

Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)

自引率

0.00%

发文量