技能很重要:多代理合作强化学习的动态技能学习

IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tong Li , Chenjia Bai , Kang Xu , Chen Chu , Peican Zhu , Zhen Wang
{"title":"技能很重要:多代理合作强化学习的动态技能学习","authors":"Tong Li ,&nbsp;Chenjia Bai ,&nbsp;Kang Xu ,&nbsp;Chen Chu ,&nbsp;Peican Zhu ,&nbsp;Zhen Wang","doi":"10.1016/j.neunet.2024.106852","DOIUrl":null,"url":null,"abstract":"<div><div>With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106852"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning\",\"authors\":\"Tong Li ,&nbsp;Chenjia Bai ,&nbsp;Kang Xu ,&nbsp;Chen Chu ,&nbsp;Peican Zhu ,&nbsp;Zhen Wang\",\"doi\":\"10.1016/j.neunet.2024.106852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"181 \",\"pages\":\"Article 106852\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024007767\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024007767","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着智能的普及,智能机器之间合作的必要性使得协作式多代理强化学习(MARL)的研究更加广泛。现有方法通常通过对环境进行任务分解或对代理进行角色分类来应对这一挑战。然而,这些研究可能依赖于代理之间的参数共享,导致代理行为的同质性,这对于复杂任务来说并不有效。或者依赖外部奖励的训练难以适应奖励稀少的场景。基于上述挑战,我们在本文中提出了一种新颖的动态技能学习(DSL)框架,让代理在内部奖励的激励下学习更多样化的能力。具体来说,动态技能学习有两个组成部分:(i) 动态技能发现,它鼓励以无监督的方式探索环境,利用技能向量和轨迹表示之间的内积产生内在奖励,从而产生有意义的技能。同时,利用状态表示函数的 Lipschitz 约束来确保所学技能的正确轨迹。(ii) 动态技能分配,即利用策略控制器,根据每个代理的不同轨迹潜变量为其分配技能。此外,为了避免技能选择的频繁变化导致训练不稳定,我们引入了正则化项来限制相邻时间步之间的技能切换。我们在《星际争霸 II》和谷歌研究足球这两个具有挑战性的基准上对 DSL 方法进行了全面测试。实验结果表明,与 QMIX 和 RODE 等强基准相比,DSL 有效地提高了性能,并且更能适应困难的协作场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning

Skill matters: Dynamic skill learning for multi-agent cooperative reinforcement learning
With the popularization of intelligence, the necessity of cooperation between intelligent machines makes the research of collaborative multi-agent reinforcement learning (MARL) more extensive. Existing approaches typically address this challenge through task decomposition of the environment or role classification of agents. However, these studies may rely on the sharing of parameters between agents, resulting in the homogeneity of agent behavior, which is not effective for complex tasks. Or training that relies on external rewards is difficult to adapt to scenarios with sparse rewards. Based on the above challenges, in this paper we propose a novel dynamic skill learning (DSL) framework for agents to learn more diverse abilities motivated by internal rewards. Specifically, the DSL has two components: (i) Dynamic skill discovery, which encourages the production of meaningful skills by exploring the environment in an unsupervised manner, using the inner product between a skill vector and a trajectory representation to generate intrinsic rewards. Meanwhile, the Lipschitz constraint of the state representation function is used to ensure the proper trajectory of the learned skills. (ii) Dynamic skill assignment, which utilizes a policy controller to assign skills to each agent based on its different trajectory latent variables. In addition, in order to avoid training instability caused by frequent changes in skill selection, we introduce a regularization term to limit skill switching between adjacent time steps. We thoroughly tested the DSL approach on two challenging benchmarks, StarCraft II and Google Research Football. Experimental results show that compared with strong benchmarks such as QMIX and RODE, DSL effectively improves performance and is more adaptable to difficult collaborative scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信