具有加性模型的随机连续武装强盗:极小极大遗憾和自适应算法

The Annals of Statistics Pub Date : 2022-08-01 DOI:10.1214/22-aos2182

T. Cai, Hongming Pu

{"title":"具有加性模型的随机连续武装强盗:极小极大遗憾和自适应算法","authors":"T. Cai, Hongming Pu","doi":"10.1214/22-aos2182","DOIUrl":null,"url":null,"abstract":"We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"46 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm\",\"authors\":\"T. Cai, Hongming Pu\",\"doi\":\"10.1214/22-aos2182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.\",\"PeriodicalId\":22375,\"journal\":{\"name\":\"The Annals of Statistics\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Annals of Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-aos2182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aos2182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们考虑d维随机连续武装强盗，期望奖励函数为可加性β -H′old，稀疏度为s，当0 < β <∞且1≤s≤d时。建立了最小最大遗憾的收敛速率为~ O (s·T β +1 2 β +1)，其中T为轮数。特别地，极小极大后悔不依赖于d，并且在s中是线性的。提出了一种新的算法，并被证明是率最优的，达到对数因子T。本文还研究了自适应问题。得到了适应平滑的代价的下界，结果表明，在没有进一步的结构假设的情况下，一般不可能实现免费适应。然后，我们在一个额外的自相似假设下考虑自适应加性SCAB。构造了一个自适应程序，并证明了该程序可以同时实现一系列平滑水平的最小最大遗憾。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm

We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Annals of Statistics

自引率

0.00%

发文量