{"title":"Heuristic dynamic programming for mobile robot path planning based on Dyna approach","authors":"Seaar Al Dabooni, D. Wunsch","doi":"10.1109/IJCNN.2016.7727679","DOIUrl":null,"url":null,"abstract":"This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna_HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.","PeriodicalId":109405,"journal":{"name":"2016 International Joint Conference on Neural Networks (IJCNN)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2016.7727679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential-drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna_HDP with other traditional reinforcement learning algorithms, namely one step Q-learning, Sarsa (λ), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near-optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi-robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.