{"title":"Risk Control of Best Arm Identification in Multi-armed Bandits via Successive Rejects","authors":"Xiaotian Yu, Irwin King, Michael R. Lyu","doi":"10.1109/ICDM.2017.153","DOIUrl":null,"url":null,"abstract":"Best arm identification in stochastic Multi-Armed Bandits (MAB) has become an essential variant in the research line of bandits for decision-making problems. In previous work, the best arm usually refers to an arm with the highest expected payoff in a given decision-arm set. However, in many practical scenarios, it would be more important and desirable to incorporate the risk of an arm into the best decision. In this paper, motivated by practical applications with risk via bandits, we investigate the problem of Risk Control of Best Arm Identification (RCBAI) in stochastic MAB. Based on the technique of Successive Rejects (SR), we show that the error resulting from the mean-variance estimation is sub-Gamma by setting mild assumptions on stochastic payoffs of arms. Besides, we develop an algorithm named as RCMAB. SR, and derive an upper bound for the probability of error for RCBAI in stochastic MAB. We demonstrate the superiority of the RCMAB. SR algorithm in synthetic datasets, and then apply the RCMAB. SR algorithm in financial data for yearly investments to show its superiority for practical applications.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Best arm identification in stochastic Multi-Armed Bandits (MAB) has become an essential variant in the research line of bandits for decision-making problems. In previous work, the best arm usually refers to an arm with the highest expected payoff in a given decision-arm set. However, in many practical scenarios, it would be more important and desirable to incorporate the risk of an arm into the best decision. In this paper, motivated by practical applications with risk via bandits, we investigate the problem of Risk Control of Best Arm Identification (RCBAI) in stochastic MAB. Based on the technique of Successive Rejects (SR), we show that the error resulting from the mean-variance estimation is sub-Gamma by setting mild assumptions on stochastic payoffs of arms. Besides, we develop an algorithm named as RCMAB. SR, and derive an upper bound for the probability of error for RCBAI in stochastic MAB. We demonstrate the superiority of the RCMAB. SR algorithm in synthetic datasets, and then apply the RCMAB. SR algorithm in financial data for yearly investments to show its superiority for practical applications.