{"title":"Deep Reinforcement Learning of Cooperative Control with Four Robotic Agents by MADDPG","authors":"Zhaoyang Wang, Renzhuo Wan, Xi Gui, Guopeng Zhou","doi":"10.1109/icceic51584.2020.00061","DOIUrl":null,"url":null,"abstract":"Due to the nature of complexity, inflexibility and non-robustness of classical cooperative control algorithms, the deep reinforcement learning has been widely researched and applied in collective and continuous behaviour control. Especially for multi-agents in real world, acquiring a full view world with a quick learning is still a great challenge. Inspired by Policy Gradient (PG) and its successors, a toy model with multi-agents by four two-dimensional manipulators environment is built based on physics engine-based MuJoCo. With a modified deep deterministic policy gradient algorithm and different credit strategies for individual agent, the cooperation and competition behaviour to target location between agents are studied. The experimental results show that each robot can complete the task with a negligible convergence effect, indicating that the MADDPG algorithm has a good performance in a complex environment, and successfully learn the strategy of multi-agent collaboration. However, with the instability of the environment caused by the increase in the number of agents, deep reinforcement learning has certain difficulties in the joint action space.","PeriodicalId":135840,"journal":{"name":"2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer Engineering and Intelligent Control (ICCEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icceic51584.2020.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Due to the nature of complexity, inflexibility and non-robustness of classical cooperative control algorithms, the deep reinforcement learning has been widely researched and applied in collective and continuous behaviour control. Especially for multi-agents in real world, acquiring a full view world with a quick learning is still a great challenge. Inspired by Policy Gradient (PG) and its successors, a toy model with multi-agents by four two-dimensional manipulators environment is built based on physics engine-based MuJoCo. With a modified deep deterministic policy gradient algorithm and different credit strategies for individual agent, the cooperation and competition behaviour to target location between agents are studied. The experimental results show that each robot can complete the task with a negligible convergence effect, indicating that the MADDPG algorithm has a good performance in a complex environment, and successfully learn the strategy of multi-agent collaboration. However, with the instability of the environment caused by the increase in the number of agents, deep reinforcement learning has certain difficulties in the joint action space.