利用多批判 TD3 进行深度强化学习,实现分散式多机器人路径规划

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei
{"title":"利用多批判 TD3 进行深度强化学习,实现分散式多机器人路径规划","authors":"Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei","doi":"10.1109/TCDS.2024.3368055","DOIUrl":null,"url":null,"abstract":"Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning\",\"authors\":\"Heqing Yin;Chang Wang;Chao Yan;Xiaojia Xiang;Boliang Cai;Changyun Wei\",\"doi\":\"10.1109/TCDS.2024.3368055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.\",\"PeriodicalId\":54300,\"journal\":{\"name\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10440479/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10440479/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

集中式多机器人路径规划是一种普遍的方法,涉及一个全局规划器,利用共享信息为每个机器人计算可行路径。然而,这种方法受到通信限制和计算复杂性的制约。为了应对这些挑战,我们引入了一种新颖的分散式多机器人路径规划方法,无需共享机器人的状态和意图。我们的方法利用深度强化学习,采用异步多批判孪生延迟深度确定性策略梯度(AMC-TD3)算法,通过纳入多批判网络和采用异步训练机制,增强了原有的基于门递归单元(GRU)-注意力的 TD3 算法。通过用独特的奖励函数训练每个批判者,我们学习到的策略能让每个机器人在复杂环境中朝着自己的长期目标航行,而不会与其他机器人发生碰撞。此外,我们的奖励函数以社会规范为基础,能让机器人在拥挤的情况下自然地避开对方。具体来说,我们训练了三个批评者来鼓励每个机器人实现其长期导航目标,保持其移动方向,并防止与其他机器人发生碰撞。我们的模型可以学习端到端的导航策略,而无需依赖精确的地图或任何定位信息,因此能高度适应各种环境。仿真结果表明,我们提出的方法在具有不同复杂程度和机器人数量的若干环境中超越了基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Reinforcement Learning With Multicritic TD3 for Decentralized Multirobot Path Planning
Centralized multirobot path planning is a prevalent approach involving a global planner computing feasible paths for each robot using shared information. Nonetheless, this approach encounters limitations due to communication constraints and computational complexity. To address these challenges, we introduce a novel decentralized multirobot path planning approach that eliminates the need for sharing the states and intentions of robots. Our approach harnesses deep reinforcement learning and features an asynchronous multicritic twin delayed deep deterministic policy gradient (AMC-TD3) algorithm, which enhances the original gate recurrent unit (GRU)-attention-based TD3 algorithm by incorporating a multicritic network and employing an asynchronous training mechanism. By training each critic with a unique reward function, our learned policy enables each robot to navigate toward its long-term objective without colliding with other robots in complex environments. Furthermore, our reward function, grounded in social norms, allows the robots to naturally avoid each other in congested situations. Specifically, we train three critics to encourage each robot to achieve its long-term navigation goal, maintain its moving direction, and prevent collisions with other robots. Our model can learn an end-to-end navigation policy without relying on an accurate map or any localization information, rendering it highly adaptable to various environments. Simulation results reveal that our proposed approach surpasses baselines in several environments with different levels of complexity and robot populations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
10.00%
发文量
170
期刊介绍: The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信