Robust cooperative multi-agent reinforcement learning via multi-view message certification

IF 7.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Science China Information Sciences Pub Date : 2024-03-22 DOI:10.1007/s11432-023-3853-y

Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu

{"title":"Robust cooperative multi-agent reinforcement learning via multi-view message certification","authors":"Lei Yuan, Tao Jiang, Lihe Li, Feng Chen, Zongzhang Zhang, Yang Yu","doi":"10.1007/s11432-023-3853-y","DOIUrl":null,"url":null,"abstract":"<p>Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant studies tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step in addressing this issue by learning a robust cooperative multi-agent reinforcement learning via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"30 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-3853-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Many multi-agent scenarios require message sharing among agents to promote coordination, hastening the robustness of multi-agent communication when policies are deployed in a message perturbation environment. Major relevant studies tackle this issue under specific assumptions, like a limited number of message channels would sustain perturbations, limiting the efficiency in complex scenarios. In this paper, we take a further step in addressing this issue by learning a robust cooperative multi-agent reinforcement learning via multi-view message certification, dubbed CroMAC. Agents trained under CroMAC can obtain guaranteed lower bounds on state-action values to identify and choose the optimal action under a worst-case deviation when the received messages are perturbed. Concretely, we first model multi-agent communication as a multi-view problem, where every message stands for a view of the state. Then we extract a certificated joint message representation by a multi-view variational autoencoder (MVAE) that uses a product-of-experts inference network. For the optimization phase, we do perturbations in the latent space of the state for a certificate guarantee. Then the learned joint message representation is used to approximate the certificated state representation during training. Extensive experiments in several cooperative multi-agent benchmarks validate the effectiveness of the proposed CroMAC.

查看原文本刊更多论文

通过多视角信息认证进行稳健的多代理合作强化学习

许多多代理场景都需要代理之间共享信息以促进协调，从而加快在信息扰动环境中部署策略时多代理通信的稳健性。主要的相关研究都是在特定的假设条件下解决这个问题的，比如数量有限的信息通道会承受扰动，从而限制了复杂场景下的效率。在本文中，我们通过多视角信息认证学习一种稳健的多代理合作强化学习方法（称为 CroMAC），进一步解决了这一问题。在 CroMAC 下训练的代理可以获得有保证的状态-行动值下限，从而在接收到的信息受到扰动时，识别并选择最坏情况偏差下的最优行动。具体来说，我们首先将多代理通信建模为一个多视图问题，其中每条信息都代表一种状态视图。然后，我们通过多视图变异自动编码器（MVAE）提取经过认证的联合信息表示，该编码器使用专家推理网络。在优化阶段，我们对状态的潜在空间进行扰动，以获得证书保证。然后，在训练过程中使用学习到的联合信息表示来近似认证状态表示。在多个合作多代理基准中进行的广泛实验验证了所提出的 CroMAC 的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

12.60

自引率

5.70%

发文量

224

审稿时长

8.3 months

期刊介绍： Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.