Approximation Algorithms for Correlated Knapsacks and Non-martingale Bandits

Anupam Gupta, Ravishankar Krishnaswamy, M. Molinaro, R. Ravi
{"title":"Approximation Algorithms for Correlated Knapsacks and Non-martingale Bandits","authors":"Anupam Gupta, Ravishankar Krishnaswamy, M. Molinaro, R. Ravi","doi":"10.1109/FOCS.2011.48","DOIUrl":null,"url":null,"abstract":"In the stochastic knapsack problem, we are given a knapsack of size B, and a set of items whose sizes and rewards are drawn from a known probability distribution. To know the actual size and reward we have to schedule the item -- when it completes, we get to know these values. The goal is to schedule the items (possibly making adaptive decisions based on the sizes seen so far) to maximize the expected total reward of items which successfully pack into the knapsack. We know constant-factor approximations when (i) the rewards and sizes are independent, and (ii) we cannot prematurely cancel items after we schedule them. What if either or both assumptions are relaxed? Related stochastic packing problems are the multi-armed bandit (and budgeted learning) problems, here one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to (adaptively) decide which arms to pull, in order to maximize the expected reward obtained after B pulls in total. Much recent work on this problem focuses on the case when the evolution of each arm follows a martingale, i.e., when the expected reward from one pull of an arm is the same as the reward at the current state. What if the rewards do not form a martingale? In this paper, we give O(1)-approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations. Extending the ideas developed here, we give O(1)-approximations for MAB problems without the martingale assumption. Indeed, we can show that previously proposed linear programming relaxations for these problems have large integrality gaps. So we propose new time-indexed LP relaxations, using a decomposition and \"gap-filling\" approach, we convert these fractional solutions to distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise randomized adaptive scheduling algorithms.","PeriodicalId":326048,"journal":{"name":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 52nd Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2011.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 76

Abstract

In the stochastic knapsack problem, we are given a knapsack of size B, and a set of items whose sizes and rewards are drawn from a known probability distribution. To know the actual size and reward we have to schedule the item -- when it completes, we get to know these values. The goal is to schedule the items (possibly making adaptive decisions based on the sizes seen so far) to maximize the expected total reward of items which successfully pack into the knapsack. We know constant-factor approximations when (i) the rewards and sizes are independent, and (ii) we cannot prematurely cancel items after we schedule them. What if either or both assumptions are relaxed? Related stochastic packing problems are the multi-armed bandit (and budgeted learning) problems, here one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to (adaptively) decide which arms to pull, in order to maximize the expected reward obtained after B pulls in total. Much recent work on this problem focuses on the case when the evolution of each arm follows a martingale, i.e., when the expected reward from one pull of an arm is the same as the reward at the current state. What if the rewards do not form a martingale? In this paper, we give O(1)-approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations. Extending the ideas developed here, we give O(1)-approximations for MAB problems without the martingale assumption. Indeed, we can show that previously proposed linear programming relaxations for these problems have large integrality gaps. So we propose new time-indexed LP relaxations, using a decomposition and "gap-filling" approach, we convert these fractional solutions to distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise randomized adaptive scheduling algorithms.
相关背包与非鞅强盗的逼近算法
在随机背包问题中,我们给定一个大小为B的背包,以及一组物品,这些物品的大小和奖励来自一个已知的概率分布。为了了解道具的实际大小和奖励,我们需要安排道具的时间——当道具完成时,我们需要知道这些数值。我们的目标是安排物品的时间(可能根据目前看到的大小做出适应性决策),以最大化成功装入背包的物品的预期总奖励。当(i)奖励和大小是独立的,以及(ii)我们不能在计划后过早取消项目时,我们知道常数因子近似值。如果其中一个或两个假设都是宽松的呢?相关的随机包装问题是多臂强盗(和预算学习)问题,这里有几个手臂,每次拉动都会以特定的随机方式进化,目标是(自适应地)决定拉动哪只手臂,以便在B拉动后获得最大的预期奖励。最近关于这个问题的许多研究都集中在每条手臂的进化遵循鞅的情况下,即当一次拉动手臂的预期奖励与当前状态的奖励相同时。如果奖励不能形成鞅呢?本文给出了具有相关和/或消去的随机背包问题的O(1)-逼近算法。在此基础上,我们给出了不含鞅假设的MAB问题的O(1)-近似。事实上,我们可以证明先前提出的这些问题的线性规划松弛具有很大的完整性间隙。因此,我们提出了新的时间索引LP松弛,使用分解和“空白填充”方法,我们将这些分数解转换为策略上的分布,然后使用这些策略的LP值和时间排序信息来设计随机自适应调度算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信