Multistage Temporal Difference Learning for 2048-Like Games

Q2 Computer Science
Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, Chiang Han
{"title":"Multistage Temporal Difference Learning for 2048-Like Games","authors":"Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, Chiang Han","doi":"10.1109/TCIAIG.2016.2593710","DOIUrl":null,"url":null,"abstract":"Szubert and Jaśkowski successfully used temporal difference (TD) learning together with n -tuple networks for playing the game 2048. However, we observed a phenomenon that the programs based on TD learning still hardly reach large tiles. In this paper, we propose multistage TD (MS-TD) learning, a kind of hierarchical reinforcement learning method, to effectively improve the performance for the rates of reaching large tiles, which are good metrics to analyze the strength of 2048 programs. Our experiments showed significant improvements over the one without using MS-TD learning. Namely, using 3-ply expectimax search, the program with MS-TD learning reached 32768-tiles with a rate of 18.31%, while the one with TD learning did not reach any. After further tuned, our 2048 program reached 32768-tiles with a rate of 31.75% in 10,000 games, and one among these games even reached a 65536-tiles, which is the first ever reaching a 65536-tiles to our knowledge. In addition, MS-TD learning method can be easily applied to other 2048-like games, such as Threes. Based on MS-TD learning, our experiments for Threes also demonstrated similar performance improvement, where the program with MS-TD learning reached 6144-tiles with a rate of 7.83%, while the one with TD learning only reached 0.45%.","PeriodicalId":49192,"journal":{"name":"IEEE Transactions on Computational Intelligence and AI in Games","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TCIAIG.2016.2593710","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Intelligence and AI in Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TCIAIG.2016.2593710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 21

Abstract

Szubert and Jaśkowski successfully used temporal difference (TD) learning together with n -tuple networks for playing the game 2048. However, we observed a phenomenon that the programs based on TD learning still hardly reach large tiles. In this paper, we propose multistage TD (MS-TD) learning, a kind of hierarchical reinforcement learning method, to effectively improve the performance for the rates of reaching large tiles, which are good metrics to analyze the strength of 2048 programs. Our experiments showed significant improvements over the one without using MS-TD learning. Namely, using 3-ply expectimax search, the program with MS-TD learning reached 32768-tiles with a rate of 18.31%, while the one with TD learning did not reach any. After further tuned, our 2048 program reached 32768-tiles with a rate of 31.75% in 10,000 games, and one among these games even reached a 65536-tiles, which is the first ever reaching a 65536-tiles to our knowledge. In addition, MS-TD learning method can be easily applied to other 2048-like games, such as Threes. Based on MS-TD learning, our experiments for Threes also demonstrated similar performance improvement, where the program with MS-TD learning reached 6144-tiles with a rate of 7.83%, while the one with TD learning only reached 0.45%.
面向2048类游戏的多阶段时间差异学习
Szubert和Jaśkowski成功地将时间差分(TD)学习与n元组网络一起用于玩游戏2048。然而,我们观察到一个现象,基于TD学习的程序仍然很难达到大的瓷砖。在本文中,我们提出了多级TD (MS-TD)学习,这是一种分层强化学习方法,可以有效地提高达到大块的率的性能,这是分析2048个程序强度的良好指标。我们的实验表明,与不使用MS-TD学习的实验相比,我们的实验有了显著的改进。即使用3-ply expectimax搜索,MS-TD学习的程序达到32768个tile,率为18.31%,而TD学习的程序没有达到任何tile。经过进一步的调整,我们的2048程序在1万局游戏中达到了32768块,命中率为31.75%,其中有一款游戏甚至达到了65536块,这是我们所知的第一次达到65536块。此外,MS-TD的学习方法也可以很容易地应用到其他类似2048的游戏中,比如《Threes》。在MS-TD学习的基础上,我们对Threes的实验也显示了类似的性能提升,其中MS-TD学习的程序达到了6144块,速率为7.83%,而TD学习的程序只有0.45%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Computational Intelligence and AI in Games
IEEE Transactions on Computational Intelligence and AI in Games COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
4.60
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: Cessation. The IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) publishes archival journal quality original papers in computational intelligence and related areas in artificial intelligence applied to games, including but not limited to videogames, mathematical games, human–computer interactions in games, and games involving physical objects. Emphasis is placed on the use of these methods to improve performance in and understanding of the dynamics of games, as well as gaining insight into the properties of the methods as applied to games. It also includes using games as a platform for building intelligent embedded agents for the real world. Papers connecting games to all areas of computational intelligence and traditional AI are considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信