Tron游戏中蒙特卡罗树搜索中不同选择策略的比较

2012 IEEE Conference on Computational Intelligence and Games (CIG) Pub Date : 2012-12-06 DOI:10.1109/CIG.2012.6374162

Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst

{"title":"Tron游戏中蒙特卡罗树搜索中不同选择策略的比较","authors":"Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst","doi":"10.1109/CIG.2012.6374162","DOIUrl":null,"url":null,"abstract":"Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.","PeriodicalId":288052,"journal":{"name":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron\",\"authors\":\"Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst\",\"doi\":\"10.1109/CIG.2012.6374162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.\",\"PeriodicalId\":288052,\"journal\":{\"name\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2012.6374162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2012.6374162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

蒙特卡罗树搜索(MCTS)技术主要以其在回合制游戏中的表现而闻名，比如围棋，在这种游戏中，玩家有相当长的时间来选择自己的走法。在本文中，我们将MCTS应用于Tron，一个同步实时的双人游戏。事实上，玩家必须快速做出反应，同时发生移动，这为MCTS创造了一个不寻常的环境，在这个环境中，经典的选择策略(如UCB1)可能不是最优的。在本文中，我们对应用于Tron的MCTS的广泛选择策略进行了实证比较，包括确定性策略(UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS)和随机策略(ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM)。从实验中，我们观察到UCBl-Tuned的最佳性能紧随UCB1之后。虽然UCB-Minimal仅排在第4位，但这是最近通过自动发现针对一般多武装强盗问题的好政策而推出的选择政策，因此取得了令人瞩目的成绩。我们还表明，对于这个问题，确定性策略比随机策略表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron

Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Conference on Computational Intelligence and Games (CIG)

自引率

0.00%

发文量