Deep Reinforcement Learning of Cooperative Control with Four Robotic Agents by MADDPG

2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC) Pub Date : 2020-11-01 DOI:10.1109/icceic51584.2020.00061

Zhaoyang Wang, Renzhuo Wan, Xi Gui, Guopeng Zhou

{"title":"Deep Reinforcement Learning of Cooperative Control with Four Robotic Agents by MADDPG","authors":"Zhaoyang Wang, Renzhuo Wan, Xi Gui, Guopeng Zhou","doi":"10.1109/icceic51584.2020.00061","DOIUrl":null,"url":null,"abstract":"Due to the nature of complexity, inflexibility and non-robustness of classical cooperative control algorithms, the deep reinforcement learning has been widely researched and applied in collective and continuous behaviour control. Especially for multi-agents in real world, acquiring a full view world with a quick learning is still a great challenge. Inspired by Policy Gradient (PG) and its successors, a toy model with multi-agents by four two-dimensional manipulators environment is built based on physics engine-based MuJoCo. With a modified deep deterministic policy gradient algorithm and different credit strategies for individual agent, the cooperation and competition behaviour to target location between agents are studied. The experimental results show that each robot can complete the task with a negligible convergence effect, indicating that the MADDPG algorithm has a good performance in a complex environment, and successfully learn the strategy of multi-agent collaboration. However, with the instability of the environment caused by the increase in the number of agents, deep reinforcement learning has certain difficulties in the joint action space.","PeriodicalId":135840,"journal":{"name":"2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icceic51584.2020.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Due to the nature of complexity, inflexibility and non-robustness of classical cooperative control algorithms, the deep reinforcement learning has been widely researched and applied in collective and continuous behaviour control. Especially for multi-agents in real world, acquiring a full view world with a quick learning is still a great challenge. Inspired by Policy Gradient (PG) and its successors, a toy model with multi-agents by four two-dimensional manipulators environment is built based on physics engine-based MuJoCo. With a modified deep deterministic policy gradient algorithm and different credit strategies for individual agent, the cooperation and competition behaviour to target location between agents are studied. The experimental results show that each robot can complete the task with a negligible convergence effect, indicating that the MADDPG algorithm has a good performance in a complex environment, and successfully learn the strategy of multi-agent collaboration. However, with the instability of the environment caused by the increase in the number of agents, deep reinforcement learning has certain difficulties in the joint action space.

查看原文本刊更多论文

基于madpg的四机器人智能体协同控制深度强化学习

由于经典合作控制算法的复杂性、不灵活性和非鲁棒性，深度强化学习在集体和连续行为控制中得到了广泛的研究和应用。特别是对于现实世界中的多智能体来说，快速学习并获得全视图世界仍然是一个巨大的挑战。受策略梯度(PG)及其后续方法的启发，基于基于物理引擎的MuJoCo，建立了一个具有多智能体的玩具模型。利用改进的深度确定性策略梯度算法和不同的个体代理信用策略，研究了agent之间对目标位置的合作与竞争行为。实验结果表明，每个机器人都能以可以忽略不计的收敛效应完成任务，表明madpg算法在复杂环境下具有良好的性能，并成功地学习了多智能体协作策略。然而，随着智能体数量的增加导致环境的不稳定性，深度强化学习在联合动作空间中存在一定的困难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC)

自引率

0.00%

发文量