Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm

T. Cai, Hongming Pu
{"title":"Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm","authors":"T. Cai, Hongming Pu","doi":"10.1214/22-aos2182","DOIUrl":null,"url":null,"abstract":"We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"46 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-aos2182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.
具有加性模型的随机连续武装强盗:极小极大遗憾和自适应算法
我们考虑d维随机连续武装强盗,期望奖励函数为可加性β -H′old,稀疏度为s,当0 < β <∞且1≤s≤d时。建立了最小最大遗憾的收敛速率为~ O (s·T β +1 2 β +1),其中T为轮数。特别地,极小极大后悔不依赖于d,并且在s中是线性的。提出了一种新的算法,并被证明是率最优的,达到对数因子T。本文还研究了自适应问题。得到了适应平滑的代价的下界,结果表明,在没有进一步的结构假设的情况下,一般不可能实现免费适应。然后,我们在一个额外的自相似假设下考虑自适应加性SCAB。构造了一个自适应程序,并证明了该程序可以同时实现一系列平滑水平的最小最大遗憾。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信