Deep Multitask Multiagent Reinforcement Learning With Knowledge Transfer

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Games Pub Date : 2023-09-19 DOI:10.1109/TG.2023.3316697

Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang

{"title":"Deep Multitask Multiagent Reinforcement Learning With Knowledge Transfer","authors":"Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang","doi":"10.1109/TG.2023.3316697","DOIUrl":null,"url":null,"abstract":"Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: \n<italic>StarCraft II</i>\n micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"566-576"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10255234/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: StarCraft II micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.

查看原文本刊更多论文

带知识转移的深度多任务多代理强化学习

尽管多代理强化学习（MARL）在处理众多复杂任务方面潜力巨大，但训练一个由 MARL 代理组成的团队来处理多个不同的团队任务仍然是一项挑战。在这篇文章中，我们介绍了一种基于合作式 MARL（MKT-MARL）知识转移的新型多任务方法。通过向特定任务的教师学习，我们的方法可使单个代理团队在多个任务中达到专家级表现。MKT-MARL 利用专为多代理架构设计的知识提炼算法，快速学习团队控制策略，其中包含从特定任务教师的经验中获得的共同协调知识。此外，我们还通过教师退火来加强这种训练，逐渐将模型的学习从蒸馏转向环境奖励。这种增强有助于多任务模型超越其单一任务教师。我们使用两个常用基准对我们的算法进行了广泛评估：星际争霸 II》微观管理和多代理粒子环境。实验结果表明，我们的算法优于单任务教师和联合训练的代理团队。广泛的消融实验说明了监督知识转移和教师退火策略的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Games Engineering-Electrical and Electronic Engineering

CiteScore

4.60

自引率

8.70%

发文量