{"title":"通过设计特殊的基函数对环境的对称性进行积分,实现强化学习中的值函数逼近","authors":"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li","doi":"10.1109/ICARCV.2016.7838691","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.","PeriodicalId":128828,"journal":{"name":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning\",\"authors\":\"Guo-fang Wang, Zhou Fang, Bo Li, Ping Li\",\"doi\":\"10.1109/ICARCV.2016.7838691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.\",\"PeriodicalId\":128828,\"journal\":{\"name\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICARCV.2016.7838691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARCV.2016.7838691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating symmetry of environment by designing special basis functions for value function approximation in reinforcement learning
Reinforcement learning (RL) is usually regarded as tabula rasa learning, and the agent needs to randomly explore the environment, so the time consuming and data inefficiency will hinder RL from the real application. In order to accelerate learning speed and improve data efficiency, in this paper we expand the symmetry definition from finite state space to infinite state space and then propose designing a special type of symmetric basis functions for value function approximation to integrate the prior knowledge of symmetry about the environment for large or even infinite state space. After that, as an example, this particular approximate structure is incorporated into the policy evaluation phase of Least-Square Policy Iteration (LSPI), which we call symmetric LSPI (S-LSPI) and the convergence property is analyzed. Simulation results of chain walk and inverted pendulum balancing demonstrate that in contrast with regular LSPI (R-LSPI), the convergence speed of S-LSPI increases greatly and the computational burden decreases significantly simultaneously. It can illustrate the use of symmetric basis functions to capture the property of symmetry very well, and as a case study, it shows the promise to integrate symmetry of environment into RL agent.