Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission

Neshat Elhami Fard, R. Selmic
{"title":"Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission","authors":"Neshat Elhami Fard, R. Selmic","doi":"10.3390/jsan11030045","DOIUrl":null,"url":null,"abstract":"This paper studies the gradient-based adversarial attacks on cluster-based, heterogeneous, multi-agent, deep reinforcement learning (MADRL) systems with time-delayed data transmission. The structure of the MADRL system consists of various clusters of agents. The deep Q-network (DQN) architecture presents the first cluster’s agent structure. The other clusters are considered as the environment of the first cluster’s DQN agent. We introduce two novel observations in data transmission, termed on-time and time-delay observations. The proposed observations are considered when the data transmission channel is idle, and the data is transmitted on time or delayed. By considering the distance between the neighboring agents, we present a novel immediate reward function by appending a distance-based reward to the previously utilized reward to improve the MADRL system performance. We consider three types of gradient-based attacks to investigate the robustness of the proposed system data transmission. Two defense methods are proposed to reduce the effects of the discussed malicious attacks. We have rigorously shown the system performance based on the DQN loss and the team reward for the entire team of agents. Moreover, the effects of the various attacks before and after using defense algorithms are demonstrated. The theoretical results are illustrated and verified with simulation examples.","PeriodicalId":288992,"journal":{"name":"J. Sens. Actuator Networks","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Sens. Actuator Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jsan11030045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper studies the gradient-based adversarial attacks on cluster-based, heterogeneous, multi-agent, deep reinforcement learning (MADRL) systems with time-delayed data transmission. The structure of the MADRL system consists of various clusters of agents. The deep Q-network (DQN) architecture presents the first cluster’s agent structure. The other clusters are considered as the environment of the first cluster’s DQN agent. We introduce two novel observations in data transmission, termed on-time and time-delay observations. The proposed observations are considered when the data transmission channel is idle, and the data is transmitted on time or delayed. By considering the distance between the neighboring agents, we present a novel immediate reward function by appending a distance-based reward to the previously utilized reward to improve the MADRL system performance. We consider three types of gradient-based attacks to investigate the robustness of the proposed system data transmission. Two defense methods are proposed to reduce the effects of the discussed malicious attacks. We have rigorously shown the system performance based on the DQN loss and the team reward for the entire team of agents. Moreover, the effects of the various attacks before and after using defense algorithms are demonstrated. The theoretical results are illustrated and verified with simulation examples.
时延数据传输异构多智能体深度强化学习系统的对抗性攻击
本文研究了基于集群、异构、多智能体、具有时延数据传输的深度强化学习(MADRL)系统的基于梯度的对抗性攻击。MADRL系统的结构由不同的agent集群组成。深度q -网络(deep Q-network, DQN)体系结构给出了第一个集群的代理结构。其他集群被认为是第一个集群的DQN代理的环境。我们介绍了两种新的数据传输观测,即准时观测和延时观测。在数据传输信道空闲、数据按时或延迟传输的情况下,考虑所提出的观测值。通过考虑相邻智能体之间的距离,我们提出了一种新的即时奖励函数,在先前使用的奖励基础上附加基于距离的奖励,以提高MADRL系统的性能。我们考虑了三种基于梯度的攻击来研究所提出的系统数据传输的鲁棒性。为了减少所讨论的恶意攻击的影响,提出了两种防御方法。我们严格展示了基于DQN损失和整个代理团队的团队奖励的系统性能。此外,还演示了使用防御算法前后各种攻击的效果。通过仿真算例对理论结果进行了说明和验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书