More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tatsuyoshi Ogawa;Chu-Hsuan Hsueh;Kokolo Ikeda
{"title":"More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning","authors":"Tatsuyoshi Ogawa;Chu-Hsuan Hsueh;Kokolo Ikeda","doi":"10.1109/TG.2024.3424668","DOIUrl":null,"url":null,"abstract":"Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better human–computer interaction, researchers have tried various methods to create human-like artificial intelligence. In chess and \n<italic>Go</i>\n, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., \n<italic>Shogi</i>\n), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning. We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZero-like reinforcement learning policy. Experiments on \n<italic>Shogi</i>\n showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and \n<italic>Go</i>\n with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"831-843"},"PeriodicalIF":1.7000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10595450","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10595450/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better human–computer interaction, researchers have tried various methods to create human-like artificial intelligence. In chess and Go , supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi ), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning. We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZero-like reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.
通过融合监督学习和强化学习的政策,让游戏玩法更接近人类
模拟人类玩家在游戏中的行为是创造自然计算机玩家、评估游戏和生成内容的关键挑战。为了实现更好的人机交互,研究人员尝试了各种方法来创造类人的人工智能。在国际象棋和围棋中,深度神经网络的监督学习被认为是预测人类棋路的最有效方法之一。然而,对于许多其他游戏(例如,Shogi),很难收集到类似数量的游戏记录,导致监督学习的移动匹配准确性很差。我们提出了一种方法,通过将监督学习策略与类似alphazero的强化学习策略混合来弥补监督学习策略的弱点。在Shogi上的实验表明,Blend方法显著提高了监督学习模型的运动匹配精度。对国际象棋和围棋的实验也显示了类似的结果。混合方法对中型和大型游戏都有效,尤其是中型游戏。验证了混合模型对参数的鲁棒性,并讨论了运动匹配精度提高的机理。此外,我们表明Blend模型比现有的试图提高移动匹配精度的工作表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Games
IEEE Transactions on Games Engineering-Electrical and Electronic Engineering
CiteScore
4.60
自引率
8.70%
发文量
87
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信