{"title":"用并行和多阶段训练方法增强的斗地主游戏完整代理的开发","authors":"Chuanfa Li;Kiminori Matsuzaki","doi":"10.1109/TG.2025.3530476","DOIUrl":null,"url":null,"abstract":"<italic>DouDizhu</i> is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu</i> have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu</i> game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.060</b>, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.058</b> and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226</b> <inline-formula><tex-math>$\\pm$</tex-math></inline-formula> <bold>0.056</b>, respectively.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 3","pages":"676-685"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods\",\"authors\":\"Chuanfa Li;Kiminori Matsuzaki\",\"doi\":\"10.1109/TG.2025.3530476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<italic>DouDizhu</i> is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of <italic>DouDizhu</i> have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete <italic>DouDizhu</i> game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our <monospace>CT-MS3-FullDouZero+</monospace> agent achieves the highest average game score at <bold>0.228</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.060</b>, while two competing agents, the state-of-the-art <monospace>CoG23+PerfectDou</monospace> and <monospace>ST-FullDouZero+</monospace>, recorded only <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.002</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.058</b> and <bold><inline-formula><tex-math>$-$</tex-math></inline-formula>0.226</b> <inline-formula><tex-math>$\\\\pm$</tex-math></inline-formula> <bold>0.056</b>, respectively.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"17 3\",\"pages\":\"676-685\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10843314/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843314/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Developing Agents for Complete DouDizhu Game Enhanced With Concurrent and Multistage Training Methods
DouDizhu is an imperfect information game involving three players, with two different bidding and cardplay phases. The large state and action spaces of the game add to its complexity. Previous studies of DouDizhu have primarily concentrated on the cardplay phase, which is the more challenging phase. As research on the cardplay agent deepens, researchers have become interested in how to train agents for the complete DouDizhu game. However, recent studies have overlooked that a poor bidding agent, which always bids 3, hinders the training of the cardplay agent. To enhance the performance of the cardplay agent (and accordingly the complete agent), this study employs a concurrent training method, in which a bidding agent is trained together with a cardplay agent and can quickly learn a good bidding play. To overcome the training difficulty encountered when the complete agent trained with the game score target, we propose a multistage training method: in the initial stage, the complete agent aims to maximize the win rate; in subsequent stages, it gradually shifts to targeting the maximum game score. Our CT-MS3-FullDouZero+ agent achieves the highest average game score at 0.228 $\pm$0.060, while two competing agents, the state-of-the-art CoG23+PerfectDou and ST-FullDouZero+, recorded only $-$0.002 $\pm$0.058 and $-$0.226 $\pm$0.056, respectively.