Yiren Zhao, Ilia Shumailov, Han Cui, Xitong Gao, R. Mullins, Ross Anderson
{"title":"Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information","authors":"Yiren Zhao, Ilia Shumailov, Han Cui, Xitong Gao, R. Mullins, Ross Anderson","doi":"10.1109/DSN-W50199.2020.00013","DOIUrl":null,"url":null,"abstract":"Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously-crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters or their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show that our approximation model, based on time-series information from the agent, consistently predicts RL agents’ future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the sequence-to-sequence model to our RL agents, they often outperform Random Gaussian Noise only marginally. Third, we propose a novel use for adversarial samples in Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This potentially enables an attacker to use devices controlled by RL agents as time bombs.","PeriodicalId":427687,"journal":{"name":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN-W50199.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously-crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters or their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show that our approximation model, based on time-series information from the agent, consistently predicts RL agents’ future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the sequence-to-sequence model to our RL agents, they often outperform Random Gaussian Noise only marginally. Third, we propose a novel use for adversarial samples in Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This potentially enables an attacker to use devices controlled by RL agents as time bombs.