基于平稳有限马尔可夫链的随机多臂强盗镜像体面算法的有效性

2013 Australian Control Conference Pub Date : 2013-07-17 DOI:10.1109/AUCC.2013.6697280

A. Nazin, B. Miller

{"title":"基于平稳有限马尔可夫链的随机多臂强盗镜像体面算法的有效性","authors":"A. Nazin, B. Miller","doi":"10.1109/AUCC.2013.6697280","DOIUrl":null,"url":null,"abstract":"In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.","PeriodicalId":177490,"journal":{"name":"2013 Australian Control Conference","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain\",\"authors\":\"A. Nazin, B. Miller\",\"doi\":\"10.1109/AUCC.2013.6697280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.\",\"PeriodicalId\":177490,\"journal\":{\"name\":\"2013 Australian Control Conference\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Australian Control Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AUCC.2013.6697280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Australian Control Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AUCC.2013.6697280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了最近发展的镜像下降随机控制算法对一类平均损失未知的随机多臂强盗控制的齐次有限马尔可夫链的有效性。我们证明了在给定(有限)时间范围内平均损失的显式非渐近上界和下界。这些边界与问题参数和时间范围的函数非常相似，但具有不同的对数项和绝对常数。数值算例说明了理论结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On effectiveness of the Mirror Decent Algorithm for a stochastic multi-armed bandit governed by a stationary finite Markov chain

In this article, we study the effectiveness of the Mirror Descent Randomized Control Algorithm recently developed to a class of homogeneous finite Markov chains governed by the stochastic multi-armed bandit with unknown mean losses. We prove the explicit, non-asymptotic both upper and lower bounds for the mean losses at a given (finite) time horizon. These bounds are very similar as functions of problem parameters and time horizon, but with different logarithmic term and absolute constant. Numerical example illustrates theoretical results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Australian Control Conference

自引率

0.00%

发文量