Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits

Richard Combes, Stefan Magureanu, A. Proutière
{"title":"Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits","authors":"Richard Combes, Stefan Magureanu, A. Proutière","doi":"10.1109/ALLERTON.2018.8635908","DOIUrl":null,"url":null,"abstract":"In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.","PeriodicalId":299280,"journal":{"name":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2018.8635908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.
多武装匪徒的一般渐近最优算法
在本报告中,我们讨论了具有随机奖励和已知结构的一般多臂强盗问题。我们对结构的概念是通用的,包括被充分研究的强盗结构,如线性、组合、单峰、利普希茨、决斗等。提出了一种通用算法,并证明了该算法在时间范围趋于无穷时的渐近最优性。我们进一步提出了算法的有限时间后悔分析。作为我们分析的副产品,我们开发了一些新的技术结果,这些结果对分析一般的强盗问题很有用。更多细节可以在技术报告https://arxiv.org/abs/1711.00400中找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信