Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard
{"title":"基于课程强化学习的约束路径下分散多机器人避碰学习行为","authors":"Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard","doi":"10.1109/LRA.2025.3581430","DOIUrl":null,"url":null,"abstract":"Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a <italic>multi-robot bootstrap curriculum</i> with preprogrammed behaviour to guide initial policy formation, subsequently refined by a <italic>gap curriculum</i> that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 8","pages":"8538-8545"},"PeriodicalIF":5.3000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning\",\"authors\":\"Md Mostafizur Rahman Komol;Brendan Tidd;Will Browne;Frederic Maire;Jason Williams;David Howard\",\"doi\":\"10.1109/LRA.2025.3581430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a <italic>multi-robot bootstrap curriculum</i> with preprogrammed behaviour to guide initial policy formation, subsequently refined by a <italic>gap curriculum</i> that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 8\",\"pages\":\"8538-8545\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11045152/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045152/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning
Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited communication environments (e.g., underground search-and-rescue operations). Existing navigation approaches exhibit suboptimal performance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigating through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorporating a multi-robot bootstrap curriculum with preprogrammed behaviour to guide initial policy formation, subsequently refined by a gap curriculum that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot interaction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incorporating noisy sensors, and 60% in field-robot tests, substantiating our model's practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.