Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst
{"title":"Tron游戏中蒙特卡罗树搜索中不同选择策略的比较","authors":"Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst","doi":"10.1109/CIG.2012.6374162","DOIUrl":null,"url":null,"abstract":"Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.","PeriodicalId":288052,"journal":{"name":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron\",\"authors\":\"Pierre Perick, D. St-Pierre, Francis Maes, D. Ernst\",\"doi\":\"10.1109/CIG.2012.6374162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.\",\"PeriodicalId\":288052,\"journal\":{\"name\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2012.6374162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2012.6374162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron
Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCBl-Tuned, UCB-V, UCB-Minimal, OMC-Deterministic, MOSS) and stochastic policies (ϵn-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCBl-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.