Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm

The Annals of Statistics Pub Date : 2022-08-01 DOI:10.1214/22-aos2182

T. Cai, Hongming Pu

引用次数: 3

Abstract

We consider d -dimensional stochastic continuum-armed bandits with the expected reward function being additive β -H¨older with sparsity s for 0 < β < ∞ and 1 ≤ s ≤ d . The rate of convergence ˜ O ( s · T β +1 2 β +1 ) for the minimax regret is established where T is the number of rounds. In particular, the minimax regret does not depend on d and is linear in s . A novel algorithm is proposed and is shown to be rate-optimal, up to a logarithmic factor of T . The problem of adaptivity is also studied. A lower bound on the cost of adaptation to the smoothness is obtained and the result implies that adaptation for free is impossible in general without further structural assumptions. We then consider adaptive additive SCAB under an additional self-similarity assumption. An adaptive procedure is constructed and is shown to simultaneously achieve the minimax regret for a range of smoothness levels.

查看原文本刊更多论文

具有加性模型的随机连续武装强盗:极小极大遗憾和自适应算法

我们考虑d维随机连续武装强盗，期望奖励函数为可加性β -H′old，稀疏度为s，当0 < β <∞且1≤s≤d时。建立了最小最大遗憾的收敛速率为~ O (s·T β +1 2 β +1)，其中T为轮数。特别地，极小极大后悔不依赖于d，并且在s中是线性的。提出了一种新的算法，并被证明是率最优的，达到对数因子T。本文还研究了自适应问题。得到了适应平滑的代价的下界，结果表明，在没有进一步的结构假设的情况下，一般不可能实现免费适应。然后，我们在一个额外的自相似假设下考虑自适应加性SCAB。构造了一个自适应程序，并证明了该程序可以同时实现一系列平滑水平的最小最大遗憾。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Annals of Statistics

自引率

0.00%

发文量