观察和学习:从显示的偏好反馈中优化

Proceedings of the forty-eighth annual ACM symposium on Theory of Computing Pub Date : 2015-04-04 DOI:10.1145/2897518.2897579

Aaron Roth, Jonathan Ullman, Zhiwei Steven Wu

{"title":"观察和学习:从显示的偏好反馈中优化","authors":"Aaron Roth, Jonathan Ullman, Zhiwei Steven Wu","doi":"10.1145/2897518.2897579","DOIUrl":null,"url":null,"abstract":"A Stackelberg game is played between a leader and a follower. The leader first chooses an action, then the follower plays his best response. The goal of the leader is to pick the action that will maximize his payoff given the follower’s best response. In this paper we present an approach to solving for the leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the subsequent best response of the follower) is unknown. Stackelberg games capture, for example, the following interaction between a producer and a consumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller in this example should not know the buyer’s utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed preference” access to it. This class includes in particular the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the optimization problems are non-convex in the leader’s actions.","PeriodicalId":442965,"journal":{"name":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":"{\"title\":\"Watch and learn: optimizing from revealed preferences feedback\",\"authors\":\"Aaron Roth, Jonathan Ullman, Zhiwei Steven Wu\",\"doi\":\"10.1145/2897518.2897579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A Stackelberg game is played between a leader and a follower. The leader first chooses an action, then the follower plays his best response. The goal of the leader is to pick the action that will maximize his payoff given the follower’s best response. In this paper we present an approach to solving for the leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the subsequent best response of the follower) is unknown. Stackelberg games capture, for example, the following interaction between a producer and a consumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller in this example should not know the buyer’s utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed preference” access to it. This class includes in particular the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the optimization problems are non-convex in the leader’s actions.\",\"PeriodicalId\":442965,\"journal\":{\"name\":\"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"58\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2897518.2897579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897518.2897579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

摘要

Stackelberg游戏是领导者和追随者之间的游戏。领导者首先选择一个行动，然后追随者采取他的最佳对策。领导者的目标是在追随者做出最佳反应的情况下，选择能使自己收益最大化的行动。在本文中，我们提出了在某些Stackelberg博弈中求解领导者最优策略的方法，其中追随者的效用函数(因此追随者的后续最佳对策)是未知的。例如，Stackelberg游戏捕捉了生产者和消费者之间的以下互动。生产者选择他生产的商品的价格，然后消费者选择购买效用最大化的一束商品。卖家的目标是设定价格，以使利润最大化——他的收入减去所购买产品的生产成本。很自然，在这个例子中，卖方不应该知道买方的效用函数。然而，他确实可以获得显示的偏好反馈——他可以设定价格，然后观察购买的捆绑包和自己的利润。在计算和查询复杂度方面，我们给出了有效解决一类广泛的Stackelberg博弈的算法，其中追随者的效用函数是未知的，只使用“显示偏好”访问它。这个类特别包括利润最大化问题，以及在延迟函数未知的非原子拥塞博弈中的最优收费问题。令人惊讶的是，我们能够解决这些问题，即使优化问题在领导者的行动中是非凸的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Watch and learn: optimizing from revealed preferences feedback

A Stackelberg game is played between a leader and a follower. The leader first chooses an action, then the follower plays his best response. The goal of the leader is to pick the action that will maximize his payoff given the follower’s best response. In this paper we present an approach to solving for the leader’s optimal strategy in certain Stackelberg games where the follower’s utility function (and thus the subsequent best response of the follower) is unknown. Stackelberg games capture, for example, the following interaction between a producer and a consumer. The producer chooses the prices of the goods he produces, and then a consumer chooses to buy a utility maximizing bundle of goods. The goal of the seller here is to set prices to maximize his profit—his revenue, minus the production cost of the purchased bundle. It is quite natural that the seller in this example should not know the buyer’s utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower’s utility function is unknown, using only “revealed preference” access to it. This class includes in particular the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the optimization problems are non-convex in the leader’s actions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

自引率

0.00%

发文量