多手强盗与额外的观察

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems Pub Date : 2018-04-03 DOI:10.1145/3219617.3219639

Donggyu Yun, Sumyeong Ahn, A. Proutière, Jinwoo Shin, Yung Yi

{"title":"多手强盗与额外的观察","authors":"Donggyu Yun, Sumyeong Ahn, A. Proutière, Jinwoo Shin, Yung Yi","doi":"10.1145/3219617.3219639","DOIUrl":null,"url":null,"abstract":"We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively.","PeriodicalId":210440,"journal":{"name":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-armed Bandit with Additional Observations\",\"authors\":\"Donggyu Yun, Sumyeong Ahn, A. Proutière, Jinwoo Shin, Yung Yi\",\"doi\":\"10.1145/3219617.3219639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively.\",\"PeriodicalId\":210440,\"journal\":{\"name\":\"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3219617.3219639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3219617.3219639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们用额外的观察来研究多臂盗匪(MAB)问题，在每一轮中，决策者选择一只手臂来玩，并通过支付一定的成本来观察额外武器的奖励(在给定的预算范围内)。我们分别提出了在随机奖励和对抗性奖励设置下的懊悔算法的渐近最优和序最优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-armed Bandit with Additional Observations

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

自引率

0.00%

发文量