{"title":"多智能体连续控制的端到端强化学习","authors":"Z. Jiao, J. Oh","doi":"10.1109/ICMLA.2019.00100","DOIUrl":null,"url":null,"abstract":"In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"End-to-End Reinforcement Learning for Multi-agent Continuous Control\",\"authors\":\"Z. Jiao, J. Oh\",\"doi\":\"10.1109/ICMLA.2019.00100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.\",\"PeriodicalId\":436714,\"journal\":{\"name\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2019.00100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
End-to-End Reinforcement Learning for Multi-agent Continuous Control
In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.