蒙特卡罗模拟平衡重新审视

2016 IEEE Conference on Computational Intelligence and Games (CIG) Pub Date : 2016-09-01 DOI:10.1109/CIG.2016.7860411

Tobias Graf, M. Platzner

{"title":"蒙特卡罗模拟平衡重新审视","authors":"Tobias Graf, M. Platzner","doi":"10.1109/CIG.2016.7860411","DOIUrl":null,"url":null,"abstract":"Simulation Balancing is an optimization algorithm to automatically tune the parameters of a playout policy used inside a Monte Carlo Tree Search. The algorithm fits a policy so that the expected result of a policy matches given target values of the training set. Up to now it has been successfully applied to Computer Go on small 9 × 9 boards but failed for larger board sizes like 19 × 19. On these large boards apprenticeship learning, which fits a policy so that it closely follows an expert, continues to be the algorithm of choice. In this paper we introduce several improvements to the original simulation balancing algorithm and test their effectiveness in Computer Go. The proposed additions remove the necessity to generate target values by deep searches, optimize faster and make the algorithm less prone to overfitting. The experiments show that simulation balancing improves the playing strength of a Go program using apprenticeship learning by more than 200 ELO on the large board size 19 × 19.","PeriodicalId":6594,"journal":{"name":"2016 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"20 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Monte-Carlo simulation balancing revisited\",\"authors\":\"Tobias Graf, M. Platzner\",\"doi\":\"10.1109/CIG.2016.7860411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simulation Balancing is an optimization algorithm to automatically tune the parameters of a playout policy used inside a Monte Carlo Tree Search. The algorithm fits a policy so that the expected result of a policy matches given target values of the training set. Up to now it has been successfully applied to Computer Go on small 9 × 9 boards but failed for larger board sizes like 19 × 19. On these large boards apprenticeship learning, which fits a policy so that it closely follows an expert, continues to be the algorithm of choice. In this paper we introduce several improvements to the original simulation balancing algorithm and test their effectiveness in Computer Go. The proposed additions remove the necessity to generate target values by deep searches, optimize faster and make the algorithm less prone to overfitting. The experiments show that simulation balancing improves the playing strength of a Go program using apprenticeship learning by more than 200 ELO on the large board size 19 × 19.\",\"PeriodicalId\":6594,\"journal\":{\"name\":\"2016 IEEE Conference on Computational Intelligence and Games (CIG)\",\"volume\":\"20 1\",\"pages\":\"1-7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Conference on Computational Intelligence and Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2016.7860411\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2016.7860411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

仿真平衡是一种优化算法，用于自动调整蒙特卡罗树搜索中使用的播放策略参数。该算法拟合策略，使策略的预期结果与给定训练集的目标值相匹配。到目前为止，它已经成功地应用于计算机围棋小9 × 9板，但失败的较大的棋盘尺寸，如19 × 19。在这些大型董事会中，学徒制学习仍然是首选算法，它符合一项政策，因此它密切跟随专家。本文对原有的仿真平衡算法进行了改进，并对其在计算机围棋中的有效性进行了测试。所提出的加法消除了通过深度搜索生成目标值的必要性，优化速度更快，并且使算法不容易出现过拟合。实验表明，在19 × 19的大棋盘上，模拟平衡使采用学徒学习的围棋程序的下棋强度提高了200个ELO以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Monte-Carlo simulation balancing revisited

Simulation Balancing is an optimization algorithm to automatically tune the parameters of a playout policy used inside a Monte Carlo Tree Search. The algorithm fits a policy so that the expected result of a policy matches given target values of the training set. Up to now it has been successfully applied to Computer Go on small 9 × 9 boards but failed for larger board sizes like 19 × 19. On these large boards apprenticeship learning, which fits a policy so that it closely follows an expert, continues to be the algorithm of choice. In this paper we introduce several improvements to the original simulation balancing algorithm and test their effectiveness in Computer Go. The proposed additions remove the necessity to generate target values by deep searches, optimize faster and make the algorithm less prone to overfitting. The experiments show that simulation balancing improves the playing strength of a Go program using apprenticeship learning by more than 200 ELO on the large board size 19 × 19.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Conference on Computational Intelligence and Games (CIG)

自引率

0.00%

发文量