MCTS模拟策略的进化学习

FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games Pub Date : 2012-05-29 DOI:10.1145/2282338.2282379

James Pettit, D. Helmbold

{"title":"MCTS模拟策略的进化学习","authors":"James Pettit, D. Helmbold","doi":"10.1145/2282338.2282379","DOIUrl":null,"url":null,"abstract":"Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.","PeriodicalId":92512,"journal":{"name":"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games","volume":"12 1","pages":"212-219"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Evolutionary learning of policies for MCTS simulations\",\"authors\":\"James Pettit, D. Helmbold\",\"doi\":\"10.1145/2282338.2282379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.\",\"PeriodicalId\":92512,\"journal\":{\"name\":\"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games\",\"volume\":\"12 1\",\"pages\":\"212-219\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2282338.2282379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2282338.2282379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

蒙特卡罗树搜索(MCTS)生成一个局部博弈树，并使用大量随机模拟来近似节点的值。它在诸如围棋和十六进制等游戏中被证明是有效的，在这些游戏中，巨大的搜索空间和评估位置的难度导致了标准方法的困难。最优秀的MCTS玩家会使用精心设计的规则来影响随机模拟。获得良好的手工制作规则是一个非常困难的过程，因为即使是促进更好的模拟玩法的规则也可能导致较弱的MCTS系统[12]。我们的Hivemind系统使用进化策略来自动学习有效的规则来对随机模拟进行偏置。我们使用Hivemind为游戏Hex构建了一个MCTS播放器。Hivemind学习的规则在与MCTS基线系统的比赛中取得了90%的胜率，并且在与计算机Hex世界冠军MoHex的比赛中取得了显著的进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evolutionary learning of policies for MCTS simulations

Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games

自引率

0.00%

发文量