Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li
{"title":"丹零+:通过强化学习统治关丹游戏","authors":"Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li","doi":"10.1109/TG.2024.3422396","DOIUrl":null,"url":null,"abstract":"Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as \n<italic>Mahjong</i>\n, \n<italic>DouDizhu</i>\n, and \n<italic>Texas Hold'em</i>\n. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called \n<italic>GuanDan</i>\n. This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to \n<italic>GuanDan</i>\n on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new \n<italic>GuanDan</i>\n AI program DanZero+, which achieves a superior performance compared to DanZero.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"914-926"},"PeriodicalIF":1.7000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DanZero+: Dominating the GuanDan Game Through Reinforcement Learning\",\"authors\":\"Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li\",\"doi\":\"10.1109/TG.2024.3422396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as \\n<italic>Mahjong</i>\\n, \\n<italic>DouDizhu</i>\\n, and \\n<italic>Texas Hold'em</i>\\n. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called \\n<italic>GuanDan</i>\\n. This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to \\n<italic>GuanDan</i>\\n on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new \\n<italic>GuanDan</i>\\n AI program DanZero+, which achieves a superior performance compared to DanZero.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"16 4\",\"pages\":\"914-926\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10584299/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10584299/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DanZero+: Dominating the GuanDan Game Through Reinforcement Learning
Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as
Mahjong
,
DouDizhu
, and
Texas Hold'em
. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called
GuanDan
. This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to
GuanDan
on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new
GuanDan
AI program DanZero+, which achieves a superior performance compared to DanZero.