用并行和多阶段训练方法增强的斗地主游戏完整代理的开发

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Games Pub Date : 2025-01-15 DOI:10.1109/TG.2025.3530476

Chuanfa Li;Kiminori Matsuzaki

{"title":"用并行和多阶段训练方法增强的斗地主游戏完整代理的开发","authors":"Chuanfa Li;Kiminori Matsuzaki","doi":"10.1109/TG.2025.3530476","DOIUrl":null,"url":null,"abstract":"<italic>DouDizhu is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228 <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.060, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002 <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.058 and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226 <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.056, respectively.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 3","pages":"676-685"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods\",\"authors\":\"Chuanfa Li;Kiminori Matsuzaki\",\"doi\":\"10.1109/TG.2025.3530476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<italic>DouDizhu is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228 <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.060, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002 <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.058 and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226 <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.056, respectively.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"17 3\",\"pages\":\"676-685\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10843314/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843314/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

斗地主是一种不完全信息游戏，涉及三个玩家，有两个不同的叫牌和打牌阶段。游戏的大状态和行动空间增加了它的复杂性。以前对豆瓣小猪的研究主要集中在卡牌阶段，这是更具挑战性的阶段。随着对纸牌游戏智能体研究的深入，研究人员对如何训练完整的斗地主游戏智能体产生了兴趣。然而，最近的研究忽略了一个糟糕的出价代理（总是出价3）会阻碍纸牌代理的训练。为了提高卡牌代理（以及完整代理）的性能，本研究采用并行训练的方法，其中一个招牌代理与一个卡牌代理一起训练，可以快速学习一个好的招牌代理。为了克服完全智能体以比赛分数目标训练时遇到的训练困难，我们提出了一种多阶段训练方法：在初始阶段，完全智能体以胜率最大化为目标；在随后的阶段中，它逐渐转向以最高游戏分数为目标。我们的CT-MS3-FullDouZero+智能体达到了最高的平均游戏分数，为0.228 $\pm$ 0.060，而两个竞争的智能体，最先进的CoG23+PerfectDou和ST-FullDouZero+分别仅为$-$0.002 $\pm$ 0.058和$-$0.226 $\pm$ 0.056。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods

DouDizhu is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of DouDizhu have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete DouDizhu game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our CT-MS3-FullDouZero+ agent achieves the highest average game score at 0.228

$\pm$

0.060, while two competing agents, the state-of-the-art CoG23+PerfectDou and ST-FullDouZero+, recorded only

$-$

0.002

$\pm$

0.058 and

$-$

0.226

$\pm$

0.056, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助