{"title":"具有加性模型的随机连续武装强盗:极小极大遗憾和自适应算法","authors":"T. Cai, Hongming Pu","doi":"10.1214/22-aos2182","DOIUrl":null,"url":null,"abstract":"We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"46 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm\",\"authors\":\"T. Cai, Hongming Pu\",\"doi\":\"10.1214/22-aos2182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.\",\"PeriodicalId\":22375,\"journal\":{\"name\":\"The Annals of Statistics\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Annals of Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-aos2182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aos2182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm
We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.