Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy

2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) Pub Date : 2020-11-18 DOI:10.1109/iciibms50712.2020.9336206

Huang Jin-bo, Yang Rui-jun, Cheng Yan

引用次数: 0

Abstract

A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.

查看原文本刊更多论文

基于轮盘赌算法和模拟退火策略的联合强化学习方法

针对传统强化学习算法收敛速度慢的问题，提出了一种基于联合模拟退火策略的Q和Sarsa组合算法。采用随机动态调整因子法平衡不同方法之间试错与效率的关系。采用轮盘赌算法对Q-learning进行改进，采用模拟退火算法替代Sarsa算法的选择策略，算法的整体收敛速度由退火速率控制。最后，对奖励函数的任务进行细分，设计基于动作分解的奖励函数。仿真结果表明，改进的路径规划方法可以有效地减少首次寻路的时间开销和碰撞次数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)

自引率

0.00%

发文量