Research Summary

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment Pub Date : 2021-06-30 DOI:10.1609/aiide.v8i6.12481

H. Baier

{"title":"Research Summary","authors":"H. Baier","doi":"10.1609/aiide.v8i6.12481","DOIUrl":null,"url":null,"abstract":"\n \n Monte-Carlo Tree Search (MCTS) is an online planning algorithm that combines the ideas of best-first tree search and Monte-Carlo evaluation. Since MCTS is based on sampling, it does not require a transition function in explicit form, but only a generative model of the domain. Because it grows a highly selective search tree guided by its samples, it can handle huge search spaces with large branching factors. By using Monte-Carlo playouts, MCTS can take long-term rewards into account even with distant horizons. Combined with multi-armed bandit algorithms to trade off exploration and exploitation, MCTS has been shown to guarantee asymptotic convergence to the optimal policy, while providing approximations when stopped at any time. The relatively new MCTS approach has started a revolution in computer Go. Furthermore, it has achieved considerable success in domains as diverse as the games of Hex, Amazons, LOA, and Ms. Pacman; in General Game Playing, planning, and optimization. Whereas the focus of previous MCTS research has been on the practical application, current research begins to address the problem of understanding the nature, the underlying principles, of MCTS. A careful understanding of MCTS will lead to more effective search algorithms. Hence, my two interrelated research questions are: How can we formulate models that increase our understanding of how MCTS works? and How can we use the developed understanding to create effective search algorithms? This research summary describes the first steps I undertook in these directions, as well as my plans for future work.\n \n","PeriodicalId":249108,"journal":{"name":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aiide.v8i6.12481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Monte-Carlo Tree Search (MCTS) is an online planning algorithm that combines the ideas of best-first tree search and Monte-Carlo evaluation. Since MCTS is based on sampling, it does not require a transition function in explicit form, but only a generative model of the domain. Because it grows a highly selective search tree guided by its samples, it can handle huge search spaces with large branching factors. By using Monte-Carlo playouts, MCTS can take long-term rewards into account even with distant horizons. Combined with multi-armed bandit algorithms to trade off exploration and exploitation, MCTS has been shown to guarantee asymptotic convergence to the optimal policy, while providing approximations when stopped at any time. The relatively new MCTS approach has started a revolution in computer Go. Furthermore, it has achieved considerable success in domains as diverse as the games of Hex, Amazons, LOA, and Ms. Pacman; in General Game Playing, planning, and optimization. Whereas the focus of previous MCTS research has been on the practical application, current research begins to address the problem of understanding the nature, the underlying principles, of MCTS. A careful understanding of MCTS will lead to more effective search algorithms. Hence, my two interrelated research questions are: How can we formulate models that increase our understanding of how MCTS works? and How can we use the developed understanding to create effective search algorithms? This research summary describes the first steps I undertook in these directions, as well as my plans for future work.

查看原文本刊更多论文

研究总结

蒙特卡罗树搜索(MCTS)是一种结合了最佳优先树搜索和蒙特卡罗评价思想的在线规划算法。由于MCTS是基于采样的，它不需要显式形式的转换函数，而只需要域的生成模型。由于它在样本的引导下生长出高度选择性的搜索树，因此它可以处理具有大分支因子的巨大搜索空间。通过使用蒙特卡罗播放，MCTS可以考虑到长期的回报，即使是遥远的地平线。结合多臂强盗算法来权衡探索和利用，MCTS已被证明可以保证最优策略的渐近收敛，同时在任何时候停止时提供近似。相对较新的MCTS方法已经在计算机围棋领域掀起了一场革命。此外，它还在《Hex》、《amazon》、《LOA》和《Ms. Pacman》等游戏领域取得了相当大的成功;游戏，计划和优化。以往的MCTS研究主要集中在实际应用上，而当前的研究开始解决理解MCTS的本质和基本原理的问题。仔细理解MCTS将导致更有效的搜索算法。因此，我的两个相互关联的研究问题是:我们如何制定模型来增加我们对MCTS如何工作的理解?以及我们如何利用已有的知识来创建有效的搜索算法?这份研究总结描述了我在这些方向上迈出的第一步，以及我对未来工作的计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

自引率

0.00%

发文量