Simin Li , Jun Guo , Jingqiao Xiu , Yuwei Zheng , Pu Feng , Xin Yu , Jiakai Wang , Aishan Liu , Yaodong Yang , Bo An , Wenjun Wu , Xianglong Liu
{"title":"Attacking cooperative multi-agent reinforcement learning by adversarial minority influence","authors":"Simin Li , Jun Guo , Jingqiao Xiu , Yuwei Zheng , Pu Feng , Xin Yu , Jiakai Wang , Aishan Liu , Yaodong Yang , Bo An , Wenjun Wu , Xianglong Liu","doi":"10.1016/j.neunet.2025.107747","DOIUrl":null,"url":null,"abstract":"<div><div>This study probes the vulnerabilities of cooperative multi-agent reinforcement learning (c-MARL) under adversarial attacks, a critical determinant of c-MARL’s worst-case performance prior to real-world implementation. Current observation-based attacks, constrained by white-box assumptions, overlook c-MARL’s complex <em>multi-agent</em> interactions and <em>cooperative</em> objectives, resulting in impractical and limited attack capabilities. To address these shortcomes, we propose <em>Adversarial Minority Influence</em> (AMI), a practical and strong for c-MARL. AMI is a practical black-box attack and can be launched without knowing victim parameters. AMI is also strong by considering the complex <em>multi-agent</em> interaction and the <em>cooperative</em> goal of agents, enabling a single adversarial agent to <em>unilaterally</em> misleads majority victims to form <em>targeted</em> worst-case cooperation. This mirrors minority influence phenomena in social psychology. To achieve maximum deviation in victim policies under complex agent-wise interactions, our <em>unilateral</em> attack aims to characterize and maximize the impact of the adversary on the victims. This is achieved by adapting a unilateral agent-wise relation metric derived from mutual information, thereby mitigating the adverse effects of victim influence on the adversary. To lead the victims into a jointly detrimental scenario, our <em>targeted</em> attack deceives victims into a long-term, cooperatively harmful situation by guiding each victim towards a specific target, determined through a trial-and-error process executed by a reinforcement learning agent. Through AMI, we achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios, including Starcraft II and Multi-agent Mujoco. The source code and demonstrations can be found at: <span><span>https://github.com/DIG-Beihang/AMI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107747"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006276","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This study probes the vulnerabilities of cooperative multi-agent reinforcement learning (c-MARL) under adversarial attacks, a critical determinant of c-MARL’s worst-case performance prior to real-world implementation. Current observation-based attacks, constrained by white-box assumptions, overlook c-MARL’s complex multi-agent interactions and cooperative objectives, resulting in impractical and limited attack capabilities. To address these shortcomes, we propose Adversarial Minority Influence (AMI), a practical and strong for c-MARL. AMI is a practical black-box attack and can be launched without knowing victim parameters. AMI is also strong by considering the complex multi-agent interaction and the cooperative goal of agents, enabling a single adversarial agent to unilaterally misleads majority victims to form targeted worst-case cooperation. This mirrors minority influence phenomena in social psychology. To achieve maximum deviation in victim policies under complex agent-wise interactions, our unilateral attack aims to characterize and maximize the impact of the adversary on the victims. This is achieved by adapting a unilateral agent-wise relation metric derived from mutual information, thereby mitigating the adverse effects of victim influence on the adversary. To lead the victims into a jointly detrimental scenario, our targeted attack deceives victims into a long-term, cooperatively harmful situation by guiding each victim towards a specific target, determined through a trial-and-error process executed by a reinforcement learning agent. Through AMI, we achieve the first successful attack against real-world robot swarms and effectively fool agents in simulated environments into collectively worst-case scenarios, including Starcraft II and Multi-agent Mujoco. The source code and demonstrations can be found at: https://github.com/DIG-Beihang/AMI.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.