{"title":"Periodic Guidance Learning","authors":"Lipeng Wan, Xuguang Lan, Xuwei Song, Chuzhen Feng, Nanning Zheng","doi":"10.1109/ICBK50248.2020.00021","DOIUrl":null,"url":null,"abstract":"Tasks with periodic states are widespread in reality. However, Current reinforcement learning (RL) algorithms generally treat such tasks as non-periodic Markov decision process, which results in low exploration efficiency and misleading advantage estimation with high variance. This paper proposes periodic guidance learning (PGL), in which a pruned advantage estimation with lower variance is implemented. Meanwhile, based on periodic states, past good experiences are utilized for better exploration. Our algorithm is evaluated on periodic tasks in MuJoCo. The experimental results show PGL method improves exploration efficiency and outperforms baselines in various periodic tasks. The results also show that PGL achieves a smooth policy optimization. Further experiments on the agent’s periodic behavior reveal the strong correlation between period length and the agents motion mode.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Tasks with periodic states are widespread in reality. However, Current reinforcement learning (RL) algorithms generally treat such tasks as non-periodic Markov decision process, which results in low exploration efficiency and misleading advantage estimation with high variance. This paper proposes periodic guidance learning (PGL), in which a pruned advantage estimation with lower variance is implemented. Meanwhile, based on periodic states, past good experiences are utilized for better exploration. Our algorithm is evaluated on periodic tasks in MuJoCo. The experimental results show PGL method improves exploration efficiency and outperforms baselines in various periodic tasks. The results also show that PGL achieves a smooth policy optimization. Further experiments on the agent’s periodic behavior reveal the strong correlation between period length and the agents motion mode.