相关背包与非鞅强盗的逼近算法

2011 IEEE 52nd Annual Symposium on Foundations of Computer Science Pub Date : 2011-02-17 DOI:10.1109/FOCS.2011.48

Anupam Gupta, Ravishankar Krishnaswamy, M. Molinaro, R. Ravi

{"title":"相关背包与非鞅强盗的逼近算法","authors":"Anupam Gupta, Ravishankar Krishnaswamy, M. Molinaro, R. Ravi","doi":"10.1109/FOCS.2011.48","DOIUrl":null,"url":null,"abstract":"In the stochastic knapsack problem, we are given a knapsack of size B, and a set of items whose sizes and rewards are drawn from a known probability distribution. To know the actual size and reward we have to schedule the item -- when it completes, we get to know these values. The goal is to schedule the items (possibly making adaptive decisions based on the sizes seen so far) to maximize the expected total reward of items which successfully pack into the knapsack. We know constant-factor approximations when (i) the rewards and sizes are independent, and (ii) we cannot prematurely cancel items after we schedule them. What if either or both assumptions are relaxed? Related stochastic packing problems are the multi-armed bandit (and budgeted learning) problems, here one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to (adaptively) decide which arms to pull, in order to maximize the expected reward obtained after B pulls in total. Much recent work on this problem focuses on the case when the evolution of each arm follows a martingale, i.e., when the expected reward from one pull of an arm is the same as the reward at the current state. What if the rewards do not form a martingale? In this paper, we give O(1)-approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations. Extending the ideas developed here, we give O(1)-approximations for MAB problems without the martingale assumption. Indeed, we can show that previously proposed linear programming relaxations for these problems have large integrality gaps. So we propose new time-indexed LP relaxations, using a decomposition and \"gap-filling\" approach, we convert these fractional solutions to distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise randomized adaptive scheduling algorithms.","PeriodicalId":326048,"journal":{"name":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":"{\"title\":\"Approximation Algorithms for Correlated Knapsacks and Non-martingale Bandits\",\"authors\":\"Anupam Gupta, Ravishankar Krishnaswamy, M. Molinaro, R. Ravi\",\"doi\":\"10.1109/FOCS.2011.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the stochastic knapsack problem, we are given a knapsack of size B, and a set of items whose sizes and rewards are drawn from a known probability distribution. To know the actual size and reward we have to schedule the item -- when it completes, we get to know these values. The goal is to schedule the items (possibly making adaptive decisions based on the sizes seen so far) to maximize the expected total reward of items which successfully pack into the knapsack. We know constant-factor approximations when (i) the rewards and sizes are independent, and (ii) we cannot prematurely cancel items after we schedule them. What if either or both assumptions are relaxed? Related stochastic packing problems are the multi-armed bandit (and budgeted learning) problems, here one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to (adaptively) decide which arms to pull, in order to maximize the expected reward obtained after B pulls in total. Much recent work on this problem focuses on the case when the evolution of each arm follows a martingale, i.e., when the expected reward from one pull of an arm is the same as the reward at the current state. What if the rewards do not form a martingale? In this paper, we give O(1)-approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations. Extending the ideas developed here, we give O(1)-approximations for MAB problems without the martingale assumption. Indeed, we can show that previously proposed linear programming relaxations for these problems have large integrality gaps. So we propose new time-indexed LP relaxations, using a decomposition and \\\"gap-filling\\\" approach, we convert these fractional solutions to distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise randomized adaptive scheduling algorithms.\",\"PeriodicalId\":326048,\"journal\":{\"name\":\"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"76\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2011.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2011.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 76

摘要

在随机背包问题中，我们给定一个大小为B的背包，以及一组物品，这些物品的大小和奖励来自一个已知的概率分布。为了了解道具的实际大小和奖励，我们需要安排道具的时间——当道具完成时，我们需要知道这些数值。我们的目标是安排物品的时间(可能根据目前看到的大小做出适应性决策)，以最大化成功装入背包的物品的预期总奖励。当(i)奖励和大小是独立的，以及(ii)我们不能在计划后过早取消项目时，我们知道常数因子近似值。如果其中一个或两个假设都是宽松的呢?相关的随机包装问题是多臂强盗(和预算学习)问题，这里有几个手臂，每次拉动都会以特定的随机方式进化，目标是(自适应地)决定拉动哪只手臂，以便在B拉动后获得最大的预期奖励。最近关于这个问题的许多研究都集中在每条手臂的进化遵循鞅的情况下，即当一次拉动手臂的预期奖励与当前状态的奖励相同时。如果奖励不能形成鞅呢?本文给出了具有相关和/或消去的随机背包问题的O(1)-逼近算法。在此基础上，我们给出了不含鞅假设的MAB问题的O(1)-近似。事实上，我们可以证明先前提出的这些问题的线性规划松弛具有很大的完整性间隙。因此，我们提出了新的时间索引LP松弛，使用分解和“空白填充”方法，我们将这些分数解转换为策略上的分布，然后使用这些策略的LP值和时间排序信息来设计随机自适应调度算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Approximation Algorithms for Correlated Knapsacks and Non-martingale Bandits

In the stochastic knapsack problem, we are given a knapsack of size B, and a set of items whose sizes and rewards are drawn from a known probability distribution. To know the actual size and reward we have to schedule the item -- when it completes, we get to know these values. The goal is to schedule the items (possibly making adaptive decisions based on the sizes seen so far) to maximize the expected total reward of items which successfully pack into the knapsack. We know constant-factor approximations when (i) the rewards and sizes are independent, and (ii) we cannot prematurely cancel items after we schedule them. What if either or both assumptions are relaxed? Related stochastic packing problems are the multi-armed bandit (and budgeted learning) problems, here one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to (adaptively) decide which arms to pull, in order to maximize the expected reward obtained after B pulls in total. Much recent work on this problem focuses on the case when the evolution of each arm follows a martingale, i.e., when the expected reward from one pull of an arm is the same as the reward at the current state. What if the rewards do not form a martingale? In this paper, we give O(1)-approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations. Extending the ideas developed here, we give O(1)-approximations for MAB problems without the martingale assumption. Indeed, we can show that previously proposed linear programming relaxations for these problems have large integrality gaps. So we propose new time-indexed LP relaxations, using a decomposition and "gap-filling" approach, we convert these fractional solutions to distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise randomized adaptive scheduling algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 52nd Annual Symposium on Foundations of Computer Science

自引率

0.00%

发文量