广义博弈求解的快速一阶方法

Proceedings of the Sixteenth ACM Conference on Economics and Computation Pub Date : 2015-06-15 DOI:10.1145/2764468.2764476

Christian Kroer, K. Waugh, F. Kılınç-Karzan, T. Sandholm

{"title":"广义博弈求解的快速一阶方法","authors":"Christian Kroer, K. Waugh, F. Kılınç-Karzan, T. Sandholm","doi":"10.1145/2764468.2764476","DOIUrl":null,"url":null,"abstract":"We study the problem of computing a Nash equilibrium in large-scale two-player zero-sum extensive-form games. While this problem can be solved in polynomial time, first-order or regret-based methods are usually preferred for large games. Regret-based methods have largely been favored in practice, in spite of their theoretically inferior convergence rates. In this paper we investigate the acceleration of first-order methods both theoretically and experimentally. An important component of many first-order methods is a distance-generating function. Motivated by this, we investigate a specific distance-generating function, namely the dilated entropy function, over treeplexes, which are convex polytopes that encompass the strategy spaces of perfect-recall extensive-form games. We develop significantly stronger bounds on the associated strong convexity parameter. In terms of extensive-form game solving, this improves the convergence rate of several first-order methods by a factor of O((#information sets ⋅ depth ⋅ M)/(2depth)) where M is the maximum value of the l1 norm over the treeplex encoding the strategy spaces. Experimentally, we investigate the performance of three first-order methods (the excessive gap technique, mirror prox, and stochastic mirror prox) and compare their performance to the regret-based algorithms. In order to instantiate stochastic mirror prox, we develop a class of gradient sampling schemes for game trees. Equipped with our distance-generating function and sampling scheme, we find that mirror prox and the excessive gap technique outperform the prior regret-based methods for finding medium accuracy solutions","PeriodicalId":376992,"journal":{"name":"Proceedings of the Sixteenth ACM Conference on Economics and Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":"{\"title\":\"Faster First-Order Methods for Extensive-Form Game Solving\",\"authors\":\"Christian Kroer, K. Waugh, F. Kılınç-Karzan, T. Sandholm\",\"doi\":\"10.1145/2764468.2764476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of computing a Nash equilibrium in large-scale two-player zero-sum extensive-form games. While this problem can be solved in polynomial time, first-order or regret-based methods are usually preferred for large games. Regret-based methods have largely been favored in practice, in spite of their theoretically inferior convergence rates. In this paper we investigate the acceleration of first-order methods both theoretically and experimentally. An important component of many first-order methods is a distance-generating function. Motivated by this, we investigate a specific distance-generating function, namely the dilated entropy function, over treeplexes, which are convex polytopes that encompass the strategy spaces of perfect-recall extensive-form games. We develop significantly stronger bounds on the associated strong convexity parameter. In terms of extensive-form game solving, this improves the convergence rate of several first-order methods by a factor of O((#information sets ⋅ depth ⋅ M)/(2depth)) where M is the maximum value of the l1 norm over the treeplex encoding the strategy spaces. Experimentally, we investigate the performance of three first-order methods (the excessive gap technique, mirror prox, and stochastic mirror prox) and compare their performance to the regret-based algorithms. In order to instantiate stochastic mirror prox, we develop a class of gradient sampling schemes for game trees. Equipped with our distance-generating function and sampling scheme, we find that mirror prox and the excessive gap technique outperform the prior regret-based methods for finding medium accuracy solutions\",\"PeriodicalId\":376992,\"journal\":{\"name\":\"Proceedings of the Sixteenth ACM Conference on Economics and Computation\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"49\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Sixteenth ACM Conference on Economics and Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2764468.2764476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixteenth ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2764468.2764476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

摘要

研究了大规模二人零和广泛形式博弈中纳什均衡的计算问题。虽然这个问题可以在多项式时间内解决，但一阶或基于遗憾的方法通常更适合大型游戏。尽管基于遗憾的方法在理论上收敛率较低，但在实践中得到了很大程度的支持。本文从理论和实验两方面研究了一阶方法的加速问题。许多一阶方法的一个重要组成部分是距离生成函数。受此启发，我们研究了一个特定的距离生成函数，即扩展熵函数，该函数在三复体上，三复体是凸多面体，包含了完美回忆广泛形式博弈的策略空间。我们在相关的强凸性参数上得到了更强的界。在广泛形式的博弈求解方面，这将几种一阶方法的收敛速度提高了一个因子O((#information sets⋅depth·M)/(2depth))，其中M是编码策略空间的三复形上l1范数的最大值。实验上，我们研究了三种一阶方法(过度间隙技术，镜像prox和随机镜像prox)的性能，并将它们的性能与基于遗憾的算法进行了比较。为了实例化随机镜像代理，我们开发了一类博弈树的梯度采样方案。利用我们的距离生成函数和采样方案，我们发现镜像代理和过度间隙技术在寻找中等精度解方面优于先前基于遗憾的方法

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Faster First-Order Methods for Extensive-Form Game Solving

We study the problem of computing a Nash equilibrium in large-scale two-player zero-sum extensive-form games. While this problem can be solved in polynomial time, first-order or regret-based methods are usually preferred for large games. Regret-based methods have largely been favored in practice, in spite of their theoretically inferior convergence rates. In this paper we investigate the acceleration of first-order methods both theoretically and experimentally. An important component of many first-order methods is a distance-generating function. Motivated by this, we investigate a specific distance-generating function, namely the dilated entropy function, over treeplexes, which are convex polytopes that encompass the strategy spaces of perfect-recall extensive-form games. We develop significantly stronger bounds on the associated strong convexity parameter. In terms of extensive-form game solving, this improves the convergence rate of several first-order methods by a factor of O((#information sets ⋅ depth ⋅ M)/(2depth)) where M is the maximum value of the l1 norm over the treeplex encoding the strategy spaces. Experimentally, we investigate the performance of three first-order methods (the excessive gap technique, mirror prox, and stochastic mirror prox) and compare their performance to the regret-based algorithms. In order to instantiate stochastic mirror prox, we develop a class of gradient sampling schemes for game trees. Equipped with our distance-generating function and sampling scheme, we find that mirror prox and the excessive gap technique outperform the prior regret-based methods for finding medium accuracy solutions

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Sixteenth ACM Conference on Economics and Computation

自引率

0.00%

发文量