Fast convergence of learning in games (invited talk)

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2017-06-19 DOI:10.1145/3055399.3084098

Vasilis Syrgkanis

引用次数: 0

Abstract

A plethora of recent work has analyzed properties of outcomes in games when each player employs a no-regret learning algorithm. Many algorithms achieve regret against the best fixed action in hindisght that decays at a rate of O(1/'T), when the game is played for T iterations. The latter rate is optimal in adversarial settings. However, in a game a player's opponents are minimizing their own regret, rather than maximizing the player's regret. (Daskalakis et al. 2014) and (Rakhlin and Sridharan 2013) showed that in two player zero-sum games O(1/T) rates are achievable. In (Syrgkanis et al. 2015), we show that O(1/T3/4) rates are achievable in general multi-player games and also analyze convergence of the dynamics to approximately optimal social welfare, where we show a convergence rate of O(1/T). The latter result was subsequently generalized to a broader class of learning algorithms by (Foster et al. 2016). This is based on joint work with Alekh Agarwal, Haipeng Luo and Robert E. Schapire.

查看原文本刊更多论文

游戏中的快速收敛学习(特邀演讲)

最近有大量研究分析了当每个玩家都使用无悔学习算法时游戏结果的属性。当游戏进行T次迭代时，许多算法在以0 (1/'T)的速率衰减的最佳固定动作中实现遗憾。后一种比率在对抗环境中是最佳的。然而，在游戏中，玩家的对手会最小化他们自己的遗憾，而不是最大化玩家的遗憾。(Daskalakis et al. 2014)和(Rakhlin and Sridharan 2013)表明，在两个玩家的零和博弈中，0 (1/T)比率是可以实现的。在(sygkanis et al. 2015)中，我们表明在一般的多人游戏中可以实现0 (1/T3/4)的速率，并且还分析了动态趋同的近似最优社会福利，其中我们显示了O(1/T)的趋同速率。后一种结果随后被推广到更广泛的学习算法类别(Foster et al. 2016)。这是基于与Alekh Agarwal, Haipeng Luo和Robert E. Schapire的合作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

自引率

0.00%

发文量