Anytime Exploration for Multi-armed Bandits using Confidence Information.

JMLR workshop and conference proceedings Pub Date : 2016-06-01

Kwang-Sung Jun, Robert Nowak

引用次数: 0

Abstract

We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-m arms at every time step. Anytime Explore-m is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.

Abstract Image

本刊更多论文

利用信任信息随时探索多武装土匪。

我们引入了anytime Explore-m，这是一个针对多臂土匪(MAB)的纯探索问题，它需要在每个时间步长对最上面的m个臂进行预测。无论何时，Explore-m都比top-m问题的固定预算或固定置信度公式更实用，因为许多应用都涉及有限但不可预测的预算。然而，任意时间算法的开发和分析面临着许多挑战。我们提出了AT-LUCB (AnyTime Lower and Upper Confidence Bound)算法，这是第一个可以证明解决AnyTime Explore-m问题的非平凡算法。我们的分析表明，AT-LUCB的样本复杂度与现有算法的任何变体相比都具有竞争力。此外，我们对AT-LUCB的实证评估表明，AT-LUCB在任何时候都与最先进的基线方法一样好，甚至更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMLR workshop and conference proceedings

自引率

0.00%

发文量