蒙特卡洛树搜索中简单遗憾最小化的探索规则

Yun-Ching Liu, Yoshimasa Tsuruoka
{"title":"蒙特卡洛树搜索中简单遗憾最小化的探索规则","authors":"Yun-Ching Liu, Yoshimasa Tsuruoka","doi":"10.1109/CIG.2015.7317923","DOIUrl":null,"url":null,"abstract":"The application of multi-armed bandit (MAB) algorithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to spend more time on sampling suboptimal arms, which may be a problem in the context of game tree search. In this research, we will propose combined confidence bounds, which utilize the characteristics of the confidence bounds of the improved UCB and UCB √· algorithms to regulate exploration for simple regret minimization in MCTS. We will demonstrate the combined confidence bounds bandit algorithm has better empirical performance than that of the UCB algorithm on the MAB problem. We will show that the combined confidence bounds MCTS (CCB-MCTS) has better performance over plain UCT on the game of 9 × 9 Go, and has shown good scalability. We will also show that the performance of CCB-MCTS can be further enhanced with the application of all-moves-as-first (AMAF) heuristic.","PeriodicalId":244862,"journal":{"name":"2015 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Regulation of exploration for simple regret minimization in Monte-Carlo tree search\",\"authors\":\"Yun-Ching Liu, Yoshimasa Tsuruoka\",\"doi\":\"10.1109/CIG.2015.7317923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The application of multi-armed bandit (MAB) algorithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to spend more time on sampling suboptimal arms, which may be a problem in the context of game tree search. In this research, we will propose combined confidence bounds, which utilize the characteristics of the confidence bounds of the improved UCB and UCB √· algorithms to regulate exploration for simple regret minimization in MCTS. We will demonstrate the combined confidence bounds bandit algorithm has better empirical performance than that of the UCB algorithm on the MAB problem. We will show that the combined confidence bounds MCTS (CCB-MCTS) has better performance over plain UCT on the game of 9 × 9 Go, and has shown good scalability. We will also show that the performance of CCB-MCTS can be further enhanced with the application of all-moves-as-first (AMAF) heuristic.\",\"PeriodicalId\":244862,\"journal\":{\"name\":\"2015 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2015.7317923\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2015.7317923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

多臂强盗算法(MAB)的应用是蒙特卡罗树搜索(MCTS)发展的关键一步。UCT算法就是一个例子,它应用了UCB强盗算法。关于将其他强盗算法应用于MCTS的各种研究已经展开。简单遗憾强盗算法旨在通过多次试验确定最优手臂,近年来在各个领域引起了人们的极大兴趣。然而,简单的遗憾强盗算法倾向于花费更多的时间来采样次优臂,这在游戏树搜索的背景下可能是一个问题。在本研究中,我们将提出组合置信界限,利用改进的UCB和UCB√·算法的置信界限特征来调节MCTS中简单遗憾最小化的探索。我们将证明在MAB问题上,组合置信边界算法比UCB算法具有更好的经验性能。我们将证明,在9 × 9围棋游戏中,组合置信界限MCTS (CCB-MCTS)比普通的UCT具有更好的性能,并且表现出良好的可扩展性。我们还将证明,应用所有动作优先(AMAF)启发式可以进一步提高CCB-MCTS的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Regulation of exploration for simple regret minimization in Monte-Carlo tree search
The application of multi-armed bandit (MAB) algorithms was a critical step in the development of Monte-Carlo tree search (MCTS). One example would be the UCT algorithm, which applies the UCB bandit algorithm. Various research has been conducted on applying other bandit algorithms to MCTS. Simple regret bandit algorithms, which aim to identify the optimal arm after a number of trials, have been of great interest in various fields in recent years. However, the simple regret bandit algorithm has the tendency to spend more time on sampling suboptimal arms, which may be a problem in the context of game tree search. In this research, we will propose combined confidence bounds, which utilize the characteristics of the confidence bounds of the improved UCB and UCB √· algorithms to regulate exploration for simple regret minimization in MCTS. We will demonstrate the combined confidence bounds bandit algorithm has better empirical performance than that of the UCB algorithm on the MAB problem. We will show that the combined confidence bounds MCTS (CCB-MCTS) has better performance over plain UCT on the game of 9 × 9 Go, and has shown good scalability. We will also show that the performance of CCB-MCTS can be further enhanced with the application of all-moves-as-first (AMAF) heuristic.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信