多武装匪徒的一般渐近最优算法

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) Pub Date : 2018-10-01 DOI:10.1109/ALLERTON.2018.8635908

Richard Combes, Stefan Magureanu, A. Proutière

{"title":"多武装匪徒的一般渐近最优算法","authors":"Richard Combes, Stefan Magureanu, A. Proutière","doi":"10.1109/ALLERTON.2018.8635908","DOIUrl":null,"url":null,"abstract":"In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.","PeriodicalId":299280,"journal":{"name":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits\",\"authors\":\"Richard Combes, Stefan Magureanu, A. Proutière\",\"doi\":\"10.1109/ALLERTON.2018.8635908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.\",\"PeriodicalId\":299280,\"journal\":{\"name\":\"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ALLERTON.2018.8635908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2018.8635908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本报告中，我们讨论了具有随机奖励和已知结构的一般多臂强盗问题。我们对结构的概念是通用的，包括被充分研究的强盗结构，如线性、组合、单峰、利普希茨、决斗等。提出了一种通用算法，并证明了该算法在时间范围趋于无穷时的渐近最优性。我们进一步提出了算法的有限时间后悔分析。作为我们分析的副产品，我们开发了一些新的技术结果，这些结果对分析一般的强盗问题很有用。更多细节可以在技术报告https://arxiv.org/abs/1711.00400中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generic Asymptotically Optimal Algorithms for Multi-Armed Bandits

In this presentation, we address generic multi-armed bandit problems with stochastic rewards and known structure. Our notion of structure is generic and includes well-studied bandit structures such as linear, combinatorial, unimodal, Lipschitz, dueling etc. We propose a generic algorithm and prove its asymptotic optimality when the time horizon goes to infinity. We further propose a finite time regret analysis of our algorithm. As a byproduct of our analysis we develop several novel technical results which are useful to analyze generic bandit problems. More details can be found in the technical report https://arxiv.org/abs/1711.00400.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

自引率

0.00%

发文量