CuDA2: An Approach for Incorporating Traitor Agents Into Cooperative Multiagent Systems

IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhen Chen;Yong Liao;Youpeng Zhao;Zipeng Dai;Jian Zhao
{"title":"CuDA2: An Approach for Incorporating Traitor Agents Into Cooperative Multiagent Systems","authors":"Zhen Chen;Yong Liao;Youpeng Zhao;Zipeng Dai;Jian Zhao","doi":"10.1109/TG.2024.3485726","DOIUrl":null,"url":null,"abstract":"Cooperative multiagent reinforcement learning (CMARL) strategies are well known to be vulnerable to adversarial perturbations. Previous works on adversarial attacks have primarily focused on glass-box attacks that directly perturb the states or actions of victim agents, often in scenarios with a limited number of attacks. However, gaining complete access to victim agents in real-world environments is exceedingly difficult. To create more realistic adversarial attacks, we introduce a novel method that involves injecting traitor agents into the CMARL system. We model this problem as a traitor Markov decision process (TMDP), where traitors cannot directly attack the victim agents but can influence their formation or positioning through collisions. In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward. Despite this, the training efficiency for traitors remains low because it is challenging for them to directly associate their actions with the victim agents' rewards. To address this issue, we propose the curiosity-driven adversarial attack (CuDA2) framework. CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies while maintaining the optimal policy invariance of the traitors. Specifically, we employ a pretrained random network distillation module, where the extra reward generated by the RND module encourages traitors to explore states unencountered by the victim agents. Extensive experiments on various scenarios from SMAC demonstrate that our CuDA2 framework offers comparable or superior adversarial attack capabilities compared to other baselines.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"397-407"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10734173/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Cooperative multiagent reinforcement learning (CMARL) strategies are well known to be vulnerable to adversarial perturbations. Previous works on adversarial attacks have primarily focused on glass-box attacks that directly perturb the states or actions of victim agents, often in scenarios with a limited number of attacks. However, gaining complete access to victim agents in real-world environments is exceedingly difficult. To create more realistic adversarial attacks, we introduce a novel method that involves injecting traitor agents into the CMARL system. We model this problem as a traitor Markov decision process (TMDP), where traitors cannot directly attack the victim agents but can influence their formation or positioning through collisions. In TMDP, traitors are trained using the same MARL algorithm as the victim agents, with their reward function set as the negative of the victim agents' reward. Despite this, the training efficiency for traitors remains low because it is challenging for them to directly associate their actions with the victim agents' rewards. To address this issue, we propose the curiosity-driven adversarial attack (CuDA2) framework. CuDA2 enhances the efficiency and aggressiveness of attacks on the specified victim agents' policies while maintaining the optimal policy invariance of the traitors. Specifically, we employ a pretrained random network distillation module, where the extra reward generated by the RND module encourages traitors to explore states unencountered by the victim agents. Extensive experiments on various scenarios from SMAC demonstrate that our CuDA2 framework offers comparable or superior adversarial attack capabilities compared to other baselines.
协作多智能体系统中叛逆者智能体的集成方法
众所周知,协作多智能体强化学习(CMARL)策略容易受到对抗性扰动的影响。以前对抗性攻击的研究主要集中在玻璃盒攻击上,这种攻击通常在攻击数量有限的情况下直接干扰受害者代理的状态或行为。然而,在现实环境中获得对受害者代理的完全访问是非常困难的。为了创建更真实的对抗性攻击,我们引入了一种新的方法,将叛徒代理注入CMARL系统。我们将此问题建模为叛徒马尔可夫决策过程(TMDP),其中叛徒不能直接攻击受害者代理,但可以通过碰撞影响他们的形成或定位。在TMDP中,叛徒使用与受害者代理相同的MARL算法进行训练,他们的奖励函数设置为受害者代理奖励的负值。尽管如此,叛徒的培训效率仍然很低,因为他们很难将自己的行为与受害者代理人的奖励直接联系起来。为了解决这个问题,我们提出了好奇心驱动的对抗性攻击(CuDA2)框架。CuDA2提高了对指定受害者代理策略的攻击效率和攻击性,同时保持了叛徒的最优策略不变性。具体来说,我们使用了一个预训练的随机网络蒸馏模块,其中RND模块产生的额外奖励鼓励叛徒探索受害者代理未遇到的状态。来自SMAC的各种场景的广泛实验表明,与其他基线相比,我们的CuDA2框架提供了相当或更好的对抗性攻击能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Games
IEEE Transactions on Games Engineering-Electrical and Electronic Engineering
CiteScore
4.60
自引率
8.70%
发文量
87
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信