用并行和多阶段训练方法增强的斗地主游戏完整代理的开发

IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Chuanfa Li;Kiminori Matsuzaki
{"title":"用并行和多阶段训练方法增强的斗地主游戏完整代理的开发","authors":"Chuanfa Li;Kiminori Matsuzaki","doi":"10.1109/TG.2025.3530476","DOIUrl":null,"url":null,"abstract":"<italic>DouDizhu</i> is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu</i> have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu</i> game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.060</b>, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.058</b> and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.056</b>, respectively.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 3","pages":"676-685"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods\",\"authors\":\"Chuanfa Li;Kiminori Matsuzaki\",\"doi\":\"10.1109/TG.2025.3530476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<italic>DouDizhu</i> is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu</i> have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu</i> game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.060</b>, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.058</b> and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.056</b>, respectively.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"17 3\",\"pages\":\"676-685\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10843314/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843314/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

斗地主是一种不完全信息游戏,涉及三个玩家,有两个不同的叫牌和打牌阶段。游戏的大状态和行动空间增加了它的复杂性。以前对豆瓣小猪的研究主要集中在卡牌阶段,这是更具挑战性的阶段。随着对纸牌游戏智能体研究的深入,研究人员对如何训练完整的斗地主游戏智能体产生了兴趣。然而,最近的研究忽略了一个糟糕的出价代理(总是出价3)会阻碍纸牌代理的训练。为了提高卡牌代理(以及完整代理)的性能,本研究采用并行训练的方法,其中一个招牌代理与一个卡牌代理一起训练,可以快速学习一个好的招牌代理。为了克服完全智能体以比赛分数目标训练时遇到的训练困难,我们提出了一种多阶段训练方法:在初始阶段,完全智能体以胜率最大化为目标;在随后的阶段中,它逐渐转向以最高游戏分数为目标。我们的CT-MS3-FullDouZero+智能体达到了最高的平均游戏分数,为0.228 $\pm$ 0.060,而两个竞争的智能体,最先进的CoG23+PerfectDou和ST-FullDouZero+分别仅为$-$0.002 $\pm$ 0.058和$-$0.226 $\pm$ 0.056。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods
DouDizhu is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of DouDizhu have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete DouDizhu game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our CT-MS3-FullDouZero+ agent achieves the highest average game score at 0.228 $\pm$ 0.060, while two competing agents, the state-of-the-art CoG23+PerfectDou and ST-FullDouZero+, recorded only $-$0.002 $\pm$ 0.058 and $-$0.226 $\pm$ 0.056, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Games
IEEE Transactions on Games Engineering-Electrical and Electronic Engineering
CiteScore
4.60
自引率
8.70%
发文量
87
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信