Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy

Huang Jin-bo, Yang Rui-jun, Cheng Yan
{"title":"Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy","authors":"Huang Jin-bo, Yang Rui-jun, Cheng Yan","doi":"10.1109/iciibms50712.2020.9336206","DOIUrl":null,"url":null,"abstract":"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.","PeriodicalId":243033,"journal":{"name":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iciibms50712.2020.9336206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.
基于轮盘赌算法和模拟退火策略的联合强化学习方法
针对传统强化学习算法收敛速度慢的问题,提出了一种基于联合模拟退火策略的Q和Sarsa组合算法。采用随机动态调整因子法平衡不同方法之间试错与效率的关系。采用轮盘赌算法对Q-learning进行改进,采用模拟退火算法替代Sarsa算法的选择策略,算法的整体收敛速度由退火速率控制。最后,对奖励函数的任务进行细分,设计基于动作分解的奖励函数。仿真结果表明,改进的路径规划方法可以有效地减少首次寻路的时间开销和碰撞次数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信