{"title":"Power-of-2-arms for bandit learning with switching costs","authors":"Ming Shi, Xiaojun Lin, Lei Jiao","doi":"10.1145/3492866.3549720","DOIUrl":null,"url":null,"abstract":"Motivated by edge computing with artificial intelligence, in this paper we study a bandit-learning problem with switching costs. Existing results in the literature either incur [EQUATION] regret with bandit feedback, or rely on free full-feedback in order to reduce the regret to [EQUATION]. In contrast, we expand our study to incorporate two new factors. First, full feedback could incur a cost. Second, the player may choose 2 (or more) arms at a time, in which case she is free to use any one of the chosen arms to calculate loss, and switching costs are incurred only when she changes the set of chosen arms. For the setting where the player pulls only one arm at a time, our new regret lower-bound shows that, even when costly full-feedback is added, the [EQUATION] regret still cannot be improved. However, the dependence on the number of arms may be improved when the full-feedback cost is small. In contrast, for the setting where the player can choose 2 (or more) arms at a time, we provide a novel online learning algorithm that achieves a lower [EQUATION] regret. Further, our new algorithm does not need any full feedback at all. This sharp difference therefore reveals the surprising power of choosing 2 (or more) arms for this type of bandit-learning problems with switching costs. Both our new algorithm and regret analysis involve several new ideas, which may be of independent interest.","PeriodicalId":335155,"journal":{"name":"Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492866.3549720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Motivated by edge computing with artificial intelligence, in this paper we study a bandit-learning problem with switching costs. Existing results in the literature either incur [EQUATION] regret with bandit feedback, or rely on free full-feedback in order to reduce the regret to [EQUATION]. In contrast, we expand our study to incorporate two new factors. First, full feedback could incur a cost. Second, the player may choose 2 (or more) arms at a time, in which case she is free to use any one of the chosen arms to calculate loss, and switching costs are incurred only when she changes the set of chosen arms. For the setting where the player pulls only one arm at a time, our new regret lower-bound shows that, even when costly full-feedback is added, the [EQUATION] regret still cannot be improved. However, the dependence on the number of arms may be improved when the full-feedback cost is small. In contrast, for the setting where the player can choose 2 (or more) arms at a time, we provide a novel online learning algorithm that achieves a lower [EQUATION] regret. Further, our new algorithm does not need any full feedback at all. This sharp difference therefore reveals the surprising power of choosing 2 (or more) arms for this type of bandit-learning problems with switching costs. Both our new algorithm and regret analysis involve several new ideas, which may be of independent interest.