Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Pub Date : 2018-10-01 DOI:10.1109/ALLERTON.2018.8635908

Richard Combes, Stefan Magureanu, A. Proutière

引用次数: 0

Abstract

In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.

查看原文本刊更多论文

多武装匪徒的一般渐近最优算法

在本报告中，我们讨论了具有随机奖励和已知结构的一般多臂强盗问题。我们对结构的概念是通用的，包括被充分研究的强盗结构，如线性、组合、单峰、利普希茨、决斗等。提出了一种通用算法，并证明了该算法在时间范围趋于无穷时的渐近最优性。我们进一步提出了算法的有限时间后悔分析。作为我们分析的副产品，我们开发了一些新的技术结果，这些结果对分析一般的强盗问题很有用。更多细节可以在技术报告https://arxiv.org/abs/1711.00400中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

自引率

0.00%

发文量