End-to-End Reinforcement Learning for Multi-agent Continuous Control

Z. Jiao, J. Oh
{"title":"End-to-End Reinforcement Learning for Multi-agent Continuous Control","authors":"Z. Jiao, J. Oh","doi":"10.1109/ICMLA.2019.00100","DOIUrl":null,"url":null,"abstract":"In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.
多智能体连续控制的端到端强化学习
在端到端强化学习中,智能体使用单个神经网络捕获从原始传感器数据到驱动命令的整个映射。端到端强化学习的研究主要集中在单智能体领域,其在多智能体环境下的可扩展性尚未得到充分探索。如果没有有效的技术,基于智能体的联合观察来学习有效的策略可能会很棘手,特别是当每个智能体感知的传感器数据是高维的时候。在多智能体行为者评价方法madpg的基础上,提出了一种用于协作环境下多智能体连续控制的端到端强化学习方法rec - madpg。为了简化多智能体环境下的端到端学习,我们提出了联合嵌入和独立嵌入两种嵌入机制,将智能体的联合传感器观察投射到低维特征上。为了提高训练效率,我们将参数共享和基于a3c的异步框架应用到Rec-MADDPG中。考虑到现实世界中多智能体控制可能出现的挑战,我们基于真实的模拟机器人和物理环境评估了机器人导航任务中的Rec-MADDPG。通过广泛的评估,我们证明Rec-MADDPG可以显著优于MADDPG,并且能够学习基于原始传感器数据的连续控制的单个端到端策略。此外,与联合嵌入相比,独立嵌入使Rec-MADDPG能够学习到更好的最优策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信