{"title":"非固定奖励的随机强盗:奖励攻防","authors":"Chenye Yang;Guanlin Liu;Lifeng Lai","doi":"10.1109/TSP.2024.3486240","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate rewards attacks on stochastic multi-armed bandit algorithms with non-stationary environment. The attacker's goal is to force the victim algorithm to choose a suboptimal arm most of the time while incurring a small attack cost. We consider three increasingly general attack scenarios, each of which has different assumptions about the environment, victim algorithm and information available to the attacker. We propose three attack strategies, one for each considered scenario, and prove that they are successful in terms of expected target arm selection and attack cost. We also propose a defense non-stationary algorithm that is able to defend any attacker whose attack cost is bounded by a budget, and prove that it is robust to attacks. The simulation results validate our theoretical analysis.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"5007-5020"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Bandits With Non-Stationary Rewards: Reward Attack and Defense\",\"authors\":\"Chenye Yang;Guanlin Liu;Lifeng Lai\",\"doi\":\"10.1109/TSP.2024.3486240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate rewards attacks on stochastic multi-armed bandit algorithms with non-stationary environment. The attacker's goal is to force the victim algorithm to choose a suboptimal arm most of the time while incurring a small attack cost. We consider three increasingly general attack scenarios, each of which has different assumptions about the environment, victim algorithm and information available to the attacker. We propose three attack strategies, one for each considered scenario, and prove that they are successful in terms of expected target arm selection and attack cost. We also propose a defense non-stationary algorithm that is able to defend any attacker whose attack cost is bounded by a budget, and prove that it is robust to attacks. The simulation results validate our theoretical analysis.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"72 \",\"pages\":\"5007-5020\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10735146/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10735146/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Stochastic Bandits With Non-Stationary Rewards: Reward Attack and Defense
In this paper, we investigate rewards attacks on stochastic multi-armed bandit algorithms with non-stationary environment. The attacker's goal is to force the victim algorithm to choose a suboptimal arm most of the time while incurring a small attack cost. We consider three increasingly general attack scenarios, each of which has different assumptions about the environment, victim algorithm and information available to the attacker. We propose three attack strategies, one for each considered scenario, and prove that they are successful in terms of expected target arm selection and attack cost. We also propose a defense non-stationary algorithm that is able to defend any attacker whose attack cost is bounded by a budget, and prove that it is robust to attacks. The simulation results validate our theoretical analysis.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.