基于课程强化学习的约束路径下分散多机器人避碰学习行为

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-06-19 DOI:10.1109/LRA.2025.3581430

Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard

{"title":"基于课程强化学习的约束路径下分散多机器人避碰学习行为","authors":"Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard","doi":"10.1109/LRA.2025.3581430","DOIUrl":null,"url":null,"abstract":"Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a <italic>multi-robot bootstrap curriculum</i> with preprogrammed behaviour to guide initial policy formation, subsequently refined by a <italic>gap curriculum</i> that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 8","pages":"8538-8545"},"PeriodicalIF":5.3000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning\",\"authors\":\"Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard\",\"doi\":\"10.1109/LRA.2025.3581430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a <italic>multi-robot bootstrap curriculum</i> with preprogrammed behaviour to guide initial policy formation, subsequently refined by a <italic>gap curriculum</i> that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 8\",\"pages\":\"8538-8545\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11045152/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045152/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

移动机器人团队通常需要在有限通信环境（例如，地下搜索和救援行动）的狭窄间隙中进行分散的自主导航。由于无法解决机器人的动态特性，现有的导航方法在避免多机器人碰撞方面表现出次优性能。利用强化学习的初步工作已经证明，在引导单个机器人通过狭窄的间隙方面取得了成功。然而，当训练智能体产生通过受限间隙导航的让步行为时，使用简单奖励的端到端强化学习由于可行策略的搜索空间增加而收敛缓慢。本文介绍了一种新的课程强化学习框架，将具有预编程行为的多机器人自举课程结合起来，指导初始策略的形成，随后通过间隙课程进行改进，逐步降低训练复杂性，达到最优策略。该框架学习了多机器人的交互行为，这是人工编程无法实现的。在高保真仿真中，我们的模型在没有代理间通信的情况下实现了99%的让行行为生成成功率。在包含噪声传感器的模拟中，成功率降低到73%，在现场机器人测试中成功率降低到60%，这证明了尽管传感器噪声和现实世界的不确定性，我们的模型仍然具有实际可行性。简单的基准方法在基本交互行为方面缺乏效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning

Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a multi-robot bootstrap curriculum with preprogrammed behaviour to guide initial policy formation, subsequently refined by a gap curriculum that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.