利用深度确定性策略梯度与微分博弈（DDPG-DG）探索移动机器人路径规划

Cognitive Robotics Pub Date : 2024-01-01 DOI:10.1016/j.cogr.2024.08.002

Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru

{"title":"利用深度确定性策略梯度与微分博弈（DDPG-DG）探索移动机器人路径规划","authors":"Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru","doi":"10.1016/j.cogr.2024.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.</p><p>This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.</p><p>Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 156-173"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667241324000119/pdfft?md5=8c083de5d6ac1af9d3cedcb0733a30fa&pid=1-s2.0-S2667241324000119-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration\",\"authors\":\"Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru\",\"doi\":\"10.1016/j.cogr.2024.08.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.</p><p>This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.</p><p>Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.</p></div>\",\"PeriodicalId\":100288,\"journal\":{\"name\":\"Cognitive Robotics\",\"volume\":\"4 \",\"pages\":\"Pages 156-173\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667241324000119/pdfft?md5=8c083de5d6ac1af9d3cedcb0733a30fa&pid=1-s2.0-S2667241324000119-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667241324000119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

移动机器人路径规划涉及在不确定的动态条件下进行决策，而强化学习（RL）算法在生成安全和最优路径方面表现出色。深度确定性策略梯度（DDPG）是一种专注于移动机器人导航的 RL 技术。RL 算法必须兼顾利用和探索，才能实现有效学习。本研究提出了一种方法，将用于开发的 DDPG 策略与用于探索的差分博弈（DG）策略相结合。DG 算法可确保移动机器人始终在无碰撞的情况下到达目标，从而为记忆缓冲区增加积极的学习事件。ε-贪婪策略决定是探索还是利用。当选择探索时，则采用 DG 算法。将 DG 策略与 DDPG 算法相结合，可以增加经验缓冲区中成功事件的数量，减少失败事件的数量，从而加快学习速度。DDPG 算法支持连续的状态和动作空间，从而使动作更平滑、不生涩，并改善了导航障碍物时对转弯的控制。奖励塑造考虑到了更精细的细节，确保每次迭代中的微小优势也能促进学习。通过各种测试场景证明，与随机探索相比，DG 探索使成功到达目标的次数平均增加了 389%，碰撞次数减少了 39%。此外，DG探索在最多2000步内实现收敛的次数提高了69%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration

Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.

This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.

Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Robotics

CiteScore

8.40

自引率

0.00%

发文量