{"title":"MCTS模拟策略的进化学习","authors":"James Pettit, D. Helmbold","doi":"10.1145/2282338.2282379","DOIUrl":null,"url":null,"abstract":"Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.","PeriodicalId":92512,"journal":{"name":"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games","volume":"12 1","pages":"212-219"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Evolutionary learning of policies for MCTS simulations\",\"authors\":\"James Pettit, D. Helmbold\",\"doi\":\"10.1145/2282338.2282379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.\",\"PeriodicalId\":92512,\"journal\":{\"name\":\"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games\",\"volume\":\"12 1\",\"pages\":\"212-219\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2282338.2282379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FDG : proceedings of the International Conference on Foundations of Digital Games. International Conference on the Foundations of Digital Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2282338.2282379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evolutionary learning of policies for MCTS simulations
Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulations to approximate the values of the nodes. It has proven effective in games with such as Go and Hex where the large search space and difficulty of evaluating positions cause difficulties for standard methods. The best MCTS players use carefully hand-crafted rules to bias the random simulations. Obtaining good hand-crafting rules is a very difficult process, as even rules promoting better simulation play can result in a weaker MCTS system [12]. Our Hivemind system uses evolution strategies to automatically learn effective rules for biasing the random simulations. We have built a MCTS player using Hivemind for the game Hex. The Hivemind learned rules result in a 90% win rate against a baseline MCTS system, and significant improvement against the computer Hex world champion, MoHex.