Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru
{"title":"利用深度确定性策略梯度与微分博弈(DDPG-DG)探索移动机器人路径规划","authors":"Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru","doi":"10.1016/j.cogr.2024.08.002","DOIUrl":null,"url":null,"abstract":"<div><p>Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.</p><p>This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.</p><p>Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"4 ","pages":"Pages 156-173"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667241324000119/pdfft?md5=8c083de5d6ac1af9d3cedcb0733a30fa&pid=1-s2.0-S2667241324000119-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration\",\"authors\":\"Shripad V. Deshpande , Harikrishnan R , Babul Salam KSM Kader Ibrahim , Mahesh Datta Sai Ponnuru\",\"doi\":\"10.1016/j.cogr.2024.08.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.</p><p>This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.</p><p>Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.</p></div>\",\"PeriodicalId\":100288,\"journal\":{\"name\":\"Cognitive Robotics\",\"volume\":\"4 \",\"pages\":\"Pages 156-173\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667241324000119/pdfft?md5=8c083de5d6ac1af9d3cedcb0733a30fa&pid=1-s2.0-S2667241324000119-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667241324000119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241324000119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mobile robot path planning using deep deterministic policy gradient with differential gaming (DDPG-DG) exploration
Mobile robot path planning involves decision-making in uncertain, dynamic conditions, where Reinforcement Learning (RL) algorithms excel in generating safe and optimal paths. The Deep Deterministic Policy Gradient (DDPG) is an RL technique focused on mobile robot navigation. RL algorithms must balance exploitation and exploration to enable effective learning. The balance between these actions directly impacts learning efficiency.
This research proposes a method combining the DDPG strategy for exploitation with the Differential Gaming (DG) strategy for exploration. The DG algorithm ensures the mobile robot always reaches its target without collisions, thereby adding positive learning episodes to the memory buffer. An epsilon-greedy strategy determines whether to explore or exploit. When exploration is chosen, the DG algorithm is employed. The combination of DG strategy with DDPG facilitates faster learning by increasing the number of successful episodes and reducing the number of failure episodes in the experience buffer. The DDPG algorithm supports continuous state and action spaces, resulting in smoother, non-jerky movements and improved control over the turns when navigating obstacles. Reward shaping considers finer details, ensuring even small advantages in each iteration contribute to learning.
Through diverse test scenarios, it is demonstrated that DG exploration, compared to random exploration, results in an average increase of 389% in successful target reaches and a 39% decrease in collisions. Additionally, DG exploration shows a 69% improvement in the number of episodes where convergence is achieved within a maximum of 2000 steps.