Nolan Bard, D. Nicholas, Csaba Szepesvari, Michael Bowling
{"title":"Decision-Theoretic Clustering of Strategies","authors":"Nolan Bard, D. Nicholas, Csaba Szepesvari, Michael Bowling","doi":"10.5555/2772879.2772886","DOIUrl":"https://doi.org/10.5555/2772879.2772886","url":null,"abstract":"Clustering agents by their behaviour can be crucial for building effective agent models. Traditional clustering typically aims to group entities together based on a distance metric, where a desirable clustering is one where the entities in a cluster are spatially close together. Instead, one may desire to cluster based on actionability, or the capacity for the clusters to suggest how an agent should respond to maximize their utility with respect to the entities. Segmentation problems examine this decision-theoretic clustering task. Although finding optimal solutions to these problems is computationally hard, greedy-based approximation algorithms exist. However, in settings where the agent has a combinatorially large number of candidate responses whose utilities must be considered, these algorithms are often intractable. In this work, we show that in many cases the utility function can be factored to allow for an efficient greedy algorithm even when there are exponentially large response spaces. We evaluate our technique theoretically, proving approximation bounds, and empirically using extensive-form games by clustering opponent strategies in toy poker games. Our results demonstrate that these techniques yield dramatically improved clusterings compared to a traditional distance-based clustering approach in terms of both subjective quality and utility obtained by responding to the clusters.","PeriodicalId":106568,"journal":{"name":"AAAI Workshop: Computer Poker and Imperfect Information","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121542143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent","authors":"Noam Brown, Sam Ganzfried, T. Sandholm","doi":"10.5555/2772879.2772885","DOIUrl":"https://doi.org/10.5555/2772879.2772885","url":null,"abstract":"The leading approach for solving large imperfect-information games is automated abstraction followed by running an equilibrium-finding algorithm. We introduce a distributed version of the most commonly used equilibrium-finding algorithm, counterfactual regret minimization (CFR), which enables CFR to scale to dramatically larger abstractions and numbers of cores. The new algorithm begets constraints on the abstraction so as to make the pieces running on different computers disjoint. We introduce an algorithm for generating such abstractions while capitalizing on state-of-the-art abstraction ideas such as imperfect recall and earth-mover's distance. Our techniques enabled an equilibrium computation of unprecedented size on a supercomputer with a high inter-blade memory latency. Prior approaches run slowly on this architecture. Our approach also leads to a significant improvement over using the prior best approach on a large shared-memory server with low memory latency. Finally, we introduce a family of post-processing techniques that outperform prior ones. We applied these techniques to generate an agent for two-player no-limit Texas Hold'em, called Tartanian7, that won the 2014 Annual Computer Poker Competition, beating each opponent with statistical significance.","PeriodicalId":106568,"journal":{"name":"AAAI Workshop: Computer Poker and Imperfect Information","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Waugh, Dustin Morrill, J. Bagnell, Michael Bowling
{"title":"Solving Games with Functional Regret Estimation","authors":"K. Waugh, Dustin Morrill, J. Bagnell, Michael Bowling","doi":"10.1609/aaai.v29i1.9445","DOIUrl":"https://doi.org/10.1609/aaai.v29i1.9445","url":null,"abstract":"\u0000 \u0000 We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work onabstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.\u0000 \u0000","PeriodicalId":106568,"journal":{"name":"AAAI Workshop: Computer Poker and Imperfect Information","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122231522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}