Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-02-20 DOI:10.1109/TCDS.2024.3368055

Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei

{"title":"Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning","authors":"Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei","doi":"10.1109/TCDS.2024.3368055","DOIUrl":null,"url":null,"abstract":"Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"16 4","pages":"1233-1247"},"PeriodicalIF":5.0000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10440479/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.

查看原文本刊更多论文

利用多批判 TD3 进行深度强化学习，实现分散式多机器人路径规划

集中式多机器人路径规划是一种普遍的方法，涉及一个全局规划器，利用共享信息为每个机器人计算可行路径。然而，这种方法受到通信限制和计算复杂性的制约。为了应对这些挑战，我们引入了一种新颖的分散式多机器人路径规划方法，无需共享机器人的状态和意图。我们的方法利用深度强化学习，采用异步多批判孪生延迟深度确定性策略梯度（AMC-TD3）算法，通过纳入多批判网络和采用异步训练机制，增强了原有的基于门递归单元（GRU）-注意力的 TD3 算法。通过用独特的奖励函数训练每个批判者，我们学习到的策略能让每个机器人在复杂环境中朝着自己的长期目标航行，而不会与其他机器人发生碰撞。此外，我们的奖励函数以社会规范为基础，能让机器人在拥挤的情况下自然地避开对方。具体来说，我们训练了三个批评者来鼓励每个机器人实现其长期导航目标，保持其移动方向，并防止与其他机器人发生碰撞。我们的模型可以学习端到端的导航策略，而无需依赖精确的地图或任何定位信息，因此能高度适应各种环境。仿真结果表明，我们提出的方法在具有不同复杂程度和机器人数量的若干环境中超越了基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cognitive and Developmental Systems Computer Science-Software

CiteScore

7.20

自引率

10.00%

发文量

170

期刊介绍： The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.