Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration

Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru
{"title":"Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration","authors":"Shripad V. Deshpande ,&nbsp;Harikrishnan R ,&nbsp;Babul Salam KSM Kader Ibrahim ,&nbsp;Mahesh Datta Sai Ponnuru","doi":"10.1016/j.cogr.2024.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.</p><p>This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.</p><p>Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 156-173"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667241324000119/pdfft?md5=8c083de5d6ac1af9d3cedcb0733a30fa&pid=1-s2.0-S2667241324000119-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.

This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.

Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.

利用深度确定性策略梯度与微分博弈(DDPG-DG)探索移动机器人路径规划
移动机器人路径规划涉及在不确定的动态条件下进行决策,而强化学习(RL)算法在生成安全和最优路径方面表现出色。深度确定性策略梯度(DDPG)是一种专注于移动机器人导航的 RL 技术。RL 算法必须兼顾利用和探索,才能实现有效学习。本研究提出了一种方法,将用于开发的 DDPG 策略与用于探索的差分博弈(DG)策略相结合。DG 算法可确保移动机器人始终在无碰撞的情况下到达目标,从而为记忆缓冲区增加积极的学习事件。ε-贪婪策略决定是探索还是利用。当选择探索时,则采用 DG 算法。将 DG 策略与 DDPG 算法相结合,可以增加经验缓冲区中成功事件的数量,减少失败事件的数量,从而加快学习速度。DDPG 算法支持连续的状态和动作空间,从而使动作更平滑、不生涩,并改善了导航障碍物时对转弯的控制。奖励塑造考虑到了更精细的细节,确保每次迭代中的微小优势也能促进学习。通过各种测试场景证明,与随机探索相比,DG 探索使成功到达目标的次数平均增加了 389%,碰撞次数减少了 39%。此外,DG探索在最多2000步内实现收敛的次数提高了69%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信