Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission

J. Sens. Actuator Networks Pub Date : 2022-08-09 DOI:10.3390/jsan11030045

Neshat Elhami Fard, R. Selmic

{"title":"Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission","authors":"Neshat Elhami Fard, R. Selmic","doi":"10.3390/jsan11030045","DOIUrl":null,"url":null,"abstract":"This paper studies the gradient-based adversarial attacks on cluster-based, heterogeneous, multi-agent, deep reinforcement learning (MADRL) systems with time-delayed data transmission. The structure of the MADRL system consists of various clusters of agents. The deep Q-network (DQN) architecture presents the first cluster’s agent structure. The other clusters are considered as the environment of the first cluster’s DQN agent. We introduce two novel observations in data transmission, termed on-time and time-delay observations. The proposed observations are considered when the data transmission channel is idle, and the data is transmitted on time or delayed. By considering the distance between the neighboring agents, we present a novel immediate reward function by appending a distance-based reward to the previously utilized reward to improve the MADRL system performance. We consider three types of gradient-based attacks to investigate the robustness of the proposed system data transmission. Two defense methods are proposed to reduce the effects of the discussed malicious attacks. We have rigorously shown the system performance based on the DQN loss and the team reward for the entire team of agents. Moreover, the effects of the various attacks before and after using defense algorithms are demonstrated. The theoretical results are illustrated and verified with simulation examples.","PeriodicalId":288992,"journal":{"name":"J. Sens. Actuator Networks","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Sens. Actuator Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jsan11030045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This paper studies the gradient-based adversarial attacks on cluster-based, heterogeneous, multi-agent, deep reinforcement learning (MADRL) systems with time-delayed data transmission. The structure of the MADRL system consists of various clusters of agents. The deep Q-network (DQN) architecture presents the first cluster’s agent structure. The other clusters are considered as the environment of the first cluster’s DQN agent. We introduce two novel observations in data transmission, termed on-time and time-delay observations. The proposed observations are considered when the data transmission channel is idle, and the data is transmitted on time or delayed. By considering the distance between the neighboring agents, we present a novel immediate reward function by appending a distance-based reward to the previously utilized reward to improve the MADRL system performance. We consider three types of gradient-based attacks to investigate the robustness of the proposed system data transmission. Two defense methods are proposed to reduce the effects of the discussed malicious attacks. We have rigorously shown the system performance based on the DQN loss and the team reward for the entire team of agents. Moreover, the effects of the various attacks before and after using defense algorithms are demonstrated. The theoretical results are illustrated and verified with simulation examples.

查看原文本刊更多论文

时延数据传输异构多智能体深度强化学习系统的对抗性攻击

本文研究了基于集群、异构、多智能体、具有时延数据传输的深度强化学习(MADRL)系统的基于梯度的对抗性攻击。MADRL系统的结构由不同的agent集群组成。深度q -网络(deep Q-network, DQN)体系结构给出了第一个集群的代理结构。其他集群被认为是第一个集群的DQN代理的环境。我们介绍了两种新的数据传输观测，即准时观测和延时观测。在数据传输信道空闲、数据按时或延迟传输的情况下，考虑所提出的观测值。通过考虑相邻智能体之间的距离，我们提出了一种新的即时奖励函数，在先前使用的奖励基础上附加基于距离的奖励，以提高MADRL系统的性能。我们考虑了三种基于梯度的攻击来研究所提出的系统数据传输的鲁棒性。为了减少所讨论的恶意攻击的影响，提出了两种防御方法。我们严格展示了基于DQN损失和整个代理团队的团队奖励的系统性能。此外，还演示了使用防御算法前后各种攻击的效果。通过仿真算例对理论结果进行了说明和验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Sens. Actuator Networks

自引率

0.00%

发文量