蒙特卡罗树搜索中的预测失真及改进算法

智能学习系统与应用(英文) Pub Date : 2018-05-07 DOI:10.4236/jilsa.2018.102004

William Li

{"title":"蒙特卡罗树搜索中的预测失真及改进算法","authors":"William Li","doi":"10.4236/jilsa.2018.102004","DOIUrl":null,"url":null,"abstract":"Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.","PeriodicalId":69452,"journal":{"name":"智能学习系统与应用(英文)","volume":"10 1","pages":"46-79"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm\",\"authors\":\"William Li\",\"doi\":\"10.4236/jilsa.2018.102004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.\",\"PeriodicalId\":69452,\"journal\":{\"name\":\"智能学习系统与应用(英文)\",\"volume\":\"10 1\",\"pages\":\"46-79\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"智能学习系统与应用(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.4236/jilsa.2018.102004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能学习系统与应用(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jilsa.2018.102004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

通过机器学习教授计算机程序玩游戏是在各种现实世界应用中实现更好的人工智能（AI）的重要途径。蒙特卡洛树搜索（MCTS）是最近开发的关键人工智能技术之一，使AlphaGo能够击败一位传奇的职业围棋选手。MCTS之所以特别吸引人，是因为它只了解游戏的基本规则，而不依赖于专家级的知识。因此，研究人员希望MCTS可以应用于其他复杂的人工智能问题，因为这些问题还没有特定领域的专家级知识。到目前为止，文献中很少有分析研究。在本文中，我们的目标是发展MCTS的分析研究，以建立对算法及其在复杂人工智能问题中的适用性的更基本的理解。我们从一个简单的MCTS版本开始，称为随机播放搜索（RPS），来玩Tic-Tac-Toe，并发现RPS可能无法发现正确的动作，即使是在Tic-Tac Toe的一个非常简单的游戏位置。概率分析和模拟都证实了我们的发现。我们继续使用完整版的MCTS玩Gomoku，发现虽然MCTS在玩围棋等更复杂的游戏方面取得了巨大成功，但它并不能有效解决猝死/获胜的问题。MCTS经常检测不到猝死/获胜的主要原因在于MCTS的随机播放搜索性质，这导致了预测失真。因此，尽管MCTS在理论上收敛于最优极小极大搜索，但在现实世界的计算资源约束下，MCTS必须依赖RPS作为其搜索过程中的重要步骤，因此与RPS一样面临着基本的预测失真问题。通过检查MCTS中得分的详细统计数据，我们调查了MCTS无法检测猝死/获胜的各种情况。最后，我们提出了一种改进的MCTS算法，通过结合极小极大搜索来克服预测失真。仿真结果验证了该算法的有效性。我们估计了这种新算法检测猝死/获胜的额外计算成本，并讨论了进一步降低搜索复杂性的启发式策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm

Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

智能学习系统与应用(英文)

自引率

0.00%

发文量

135