{"title":"Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy","authors":"Huang Jin-bo, Yang Rui-jun, Cheng Yan","doi":"10.1109/iciibms50712.2020.9336206","DOIUrl":null,"url":null,"abstract":"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.","PeriodicalId":243033,"journal":{"name":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iciibms50712.2020.9336206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.