基于Dyna方法的启发式动态规划移动机器人路径规划

2016 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2016-07-24 DOI:10.1109/IJCNN.2016.7727679

Seaar Al Dabooni, D. Wunsch

{"title":"基于Dyna方法的启发式动态规划移动机器人路径规划","authors":"Seaar Al Dabooni, D. Wunsch","doi":"10.1109/IJCNN.2016.7727679","DOIUrl":null,"url":null,"abstract":"This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna_HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.","PeriodicalId":109405,"journal":{"name":"2016 International Joint Conference on Neural Networks (IJCNN)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Heuristic dynamic programming for mobile robot path planning based on Dyna approach\",\"authors\":\"Seaar Al Dabooni, D. Wunsch\",\"doi\":\"10.1109/IJCNN.2016.7727679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna_HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.\",\"PeriodicalId\":109405,\"journal\":{\"name\":\"2016 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2016.7727679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2016.7727679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

提出了一种基于动态规划的直接启发式动态规划(Dyna_HDP)，用于马尔可夫决策过程的在线模型学习。该方法通过HDP策略学习来构建动态代理，加快了学习速度。我们评估Dyna_HDP在一个差速驱动轮式移动机器人在二维迷宫中的导航问题。在相同的基准条件下，通过仿真比较了Dyna_HDP与其他传统的强化学习算法，即一步q学习、Sarsa (λ)和Dyna_Q。我们证明了Dyna_HDP具有比其他算法更快的近最优路径，并且具有较高的稳定性。此外，我们还证实了Dyna_HDP方法可以应用于多机器人路径规划问题。虚拟公共环境模型是通过共享机器人的经验来学习的，大大缩短了学习时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Heuristic dynamic programming for mobile robot path planning based on Dyna approach

This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna_HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量