遗传算法如何通过选择劣势策略来获得更大的收益来学习旅行者困境

2009 IEEE Symposium on Computational Intelligence and Games Pub Date : 2009-09-07 DOI:10.1109/CIG.2009.5286474

M. Pace

{"title":"遗传算法如何通过选择劣势策略来获得更大的收益来学习旅行者困境","authors":"M. Pace","doi":"10.1109/CIG.2009.5286474","DOIUrl":null,"url":null,"abstract":"In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"How a genetic algorithm learns to play Traveler's Dilemma by choosing dominated strategies to achieve greater payoffs\",\"authors\":\"M. Pace\",\"doi\":\"10.1109/CIG.2009.5286474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.\",\"PeriodicalId\":358795,\"journal\":{\"name\":\"2009 IEEE Symposium on Computational Intelligence and Games\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Symposium on Computational Intelligence and Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2009.5286474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Symposium on Computational Intelligence and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2009.5286474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

在博弈论中，旅行者困境(简称TD)是一种非零和博弈，在这种博弈中，两名参与者试图在不故意伤害对手的情况下最大化自己的收益。在这个问题的经典表述中，博弈论预测，如果双方都是纯理性的，他们总是会选择与博弈的纳什均衡相对应的策略。然而，当进行实验时，大多数人类玩家会选择更高的价值(通常接近100美元)，这大大偏离了纳什均衡，并获得了更高的奖励。在本文中，我们分析了遗传算法的行为，通过反复玩游戏，进化策略以最大化收益。在算法中，总体对游戏没有先验知识。适应度函数奖励在每个游戏回合结束时获得高收益的个体。我们证明，当可以为每个策略分配一个概率度量时，那么搜索好的策略可以有效地转化为使用例如遗传算法在度量空间中的搜索问题。此外，基因组作为概率分布的编码允许在基因组是概率度量的不常见情况下分析常见的交叉和突变操作符。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How a genetic algorithm learns to play Traveler's Dilemma by choosing dominated strategies to achieve greater payoffs

In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE Symposium on Computational Intelligence and Games

自引率

0.00%

发文量