在拍卖中学习:后悔很难,嫉妒很容易

C. Daskalakis, Vasilis Syrgkanis
{"title":"在拍卖中学习:后悔很难,嫉妒很容易","authors":"C. Daskalakis, Vasilis Syrgkanis","doi":"10.1109/FOCS.2016.31","DOIUrl":null,"url":null,"abstract":"An extensive body of recent work studies the welfare guarantees of simple and prevalent combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) [1], [2], [3]. These guarantees hold even when the auctions are repeatedly executed and the players use no-regret learning algorithms to choose their actions. Unfortunately, off-the-shelf no-regret learning algorithms for these auctions are computationally inefficient as the number of actions available to the players becomes exponential. We show that this obstacle is inevitable: there are no polynomial-time no-regret learning algorithms for SiSPAs, unless RP ⊇ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed \"no-envy learning.\" This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have valuations from the broad class of fractionally subadditive (XOS) valuations (assuming demand oracle access to the valuations) or coverage valuations (even without demand oracles). No-envy learning outcomes are a relaxation of no-regret learning outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our positive and negative results extend to several auction formats that have been studied in the literature via the smoothness paradigm. Our positive results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts and states of nature are both infinite, and the payoff function of the learner is non-linear. We show that this algorithm has applications outside of auction settings, establishing significant gains in a recent application of no-regret learning in security games. Our efficient learning result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.","PeriodicalId":414001,"journal":{"name":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":"{\"title\":\"Learning in Auctions: Regret is Hard, Envy is Easy\",\"authors\":\"C. Daskalakis, Vasilis Syrgkanis\",\"doi\":\"10.1109/FOCS.2016.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An extensive body of recent work studies the welfare guarantees of simple and prevalent combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) [1], [2], [3]. These guarantees hold even when the auctions are repeatedly executed and the players use no-regret learning algorithms to choose their actions. Unfortunately, off-the-shelf no-regret learning algorithms for these auctions are computationally inefficient as the number of actions available to the players becomes exponential. We show that this obstacle is inevitable: there are no polynomial-time no-regret learning algorithms for SiSPAs, unless RP ⊇ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed \\\"no-envy learning.\\\" This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have valuations from the broad class of fractionally subadditive (XOS) valuations (assuming demand oracle access to the valuations) or coverage valuations (even without demand oracles). No-envy learning outcomes are a relaxation of no-regret learning outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our positive and negative results extend to several auction formats that have been studied in the literature via the smoothness paradigm. Our positive results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts and states of nature are both infinite, and the payoff function of the learner is non-linear. We show that this algorithm has applications outside of auction settings, establishing significant gains in a recent application of no-regret learning in security games. Our efficient learning result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.\",\"PeriodicalId\":414001,\"journal\":{\"name\":\"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"58\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2016.31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2016.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58

摘要

最近的大量工作研究了简单和普遍的组合拍卖形式的福利保障,例如通过同步第二价格拍卖(SiSPAs)[1],[2],[3]出售m件物品。即使拍卖被反复执行,玩家使用无悔学习算法来选择他们的行动,这些保证仍然有效。不幸的是,这些拍卖现成的无悔学习算法在计算上效率很低,因为玩家可用的行动数量呈指数增长。我们证明了这个障碍是不可避免的:除非RP是单位需求的,否则没有多项式时间的无遗憾学习算法。我们的下界提出了一个问题,即多项式有界的竞标者在这样的拍卖中可能会发现多么好的结果。为了回答这个问题,我们提出了一个新的拍卖学习概念,称为“无嫉妒学习”。这个概念是建立在瓦尔拉斯均衡的基础上的,我们证明了它既可以有效地实现,也可以产生近似最优的福利,即使竞标者的估值来自广泛的分数次加性(XOS)估值(假设需求预测器访问估值)或覆盖估值(即使没有需求预测器)。无嫉妒学习结果是对无后悔学习结果的一种放松,在保持其近似福利最优的同时赋予其计算可追溯性。我们的正面和负面结果扩展到文献中通过平滑范式研究的几种拍卖格式。我们对XOS估值的积极结果是通过一种新颖的跟随受扰领导者算法实现的,该算法适用于专家数量和自然状态都是无限的情况,并且学习者的收益函数是非线性的。我们展示了该算法在拍卖设置之外的应用,在最近的安全游戏中无悔学习的应用中取得了重大进展。我们对覆盖率评估的有效学习结果是基于凸舍入方案的新使用和对在线凸优化的简化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning in Auctions: Regret is Hard, Envy is Easy
An extensive body of recent work studies the welfare guarantees of simple and prevalent combinatorial auction formats, such as selling m items via simultaneous second price auctions (SiSPAs) [1], [2], [3]. These guarantees hold even when the auctions are repeatedly executed and the players use no-regret learning algorithms to choose their actions. Unfortunately, off-the-shelf no-regret learning algorithms for these auctions are computationally inefficient as the number of actions available to the players becomes exponential. We show that this obstacle is inevitable: there are no polynomial-time no-regret learning algorithms for SiSPAs, unless RP ⊇ NP, even when the bidders are unit-demand. Our lower bound raises the question of how good outcomes polynomially-bounded bidders may discover in such auctions. To answer this question, we propose a novel concept of learning in auctions, termed "no-envy learning." This notion is founded upon Walrasian equilibrium, and we show that it is both efficiently implementable and results in approximately optimal welfare, even when the bidders have valuations from the broad class of fractionally subadditive (XOS) valuations (assuming demand oracle access to the valuations) or coverage valuations (even without demand oracles). No-envy learning outcomes are a relaxation of no-regret learning outcomes, which maintain their approximate welfare optimality while endowing them with computational tractability. Our positive and negative results extend to several auction formats that have been studied in the literature via the smoothness paradigm. Our positive results for XOS valuations are enabled by a novel Follow-The-Perturbed-Leader algorithm for settings where the number of experts and states of nature are both infinite, and the payoff function of the learner is non-linear. We show that this algorithm has applications outside of auction settings, establishing significant gains in a recent application of no-regret learning in security games. Our efficient learning result for coverage valuations is based on a novel use of convex rounding schemes and a reduction to online convex optimization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信