丹零+：通过强化学习统治关丹游戏

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Games Pub Date : 2024-07-03 DOI:10.1109/TG.2024.3422396

Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li

{"title":"丹零+：通过强化学习统治关丹游戏","authors":"Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li","doi":"10.1109/TG.2024.3422396","DOIUrl":null,"url":null,"abstract":"Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as \n<italic>Mahjong\n, \n<italic>DouDizhu\n, and \n<italic>Texas Hold'em\n. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called \n<italic>GuanDan\n. This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to \n<italic>GuanDan\n on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new \n<italic>GuanDan\n AI program DanZero+, which achieves a superior performance compared to DanZero.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"914-926"},"PeriodicalIF":2.8000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DanZero+: Dominating the GuanDan Game Through Reinforcement Learning\",\"authors\":\"Youpeng Zhao;Yudong Lu;Jian Zhao;Wengang Zhou;Houqiang Li\",\"doi\":\"10.1109/TG.2024.3422396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as \\n<italic>Mahjong\\n, \\n<italic>DouDizhu\\n, and \\n<italic>Texas Hold'em\\n. In this work, we aim to develop an AI program for an exceptionally complex and popular card game called \\n<italic>GuanDan\\n. This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to \\n<italic>GuanDan\\n on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new \\n<italic>GuanDan\\n AI program DanZero+, which achieves a superior performance compared to DanZero.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"16 4\",\"pages\":\"914-926\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10584299/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10584299/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

最近的进步推动了人工智能（AI）在复杂的纸牌游戏中展示专长，比如麻将、斗猪猪和德州扑克。在这项工作中，我们的目标是为一个非常复杂和流行的纸牌游戏“关弹”开发一个人工智能程序。这款游戏包含4名玩家，他们在漫长的过程中进行竞争和合作，由于其广阔的状态和行动空间，较长的情节长度和复杂的规则，给AI带来了巨大的挑战。利用强化学习技术，特别是深度蒙特卡罗和分布式训练框架，我们首先提出了一个名为DanZero的人工智能程序。基于启发式规则对基线人工智能程序进行评估，突出了我们的机器人的出色性能。此外，为了进一步增强人工智能的能力，我们在Danzero的基础上，对关丹进行了近端策略优化。为了解决巨大的动作空间所带来的挑战，这将显著影响基于策略的算法的性能，我们采用预训练模型来压缩动作空间，并将动作特征集成到模型中以增强其泛化能力。利用这些技术，我们获得了一个新的关丹人工智能程序DanZero+，该程序的性能优于DanZero。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DanZero+: Dominating the GuanDan Game Through Reinforcement Learning

Recent advancements have propelled artificial intelligence (AI) to showcase expertise in intricate card games, such as Mahjong , DouDizhu , and Texas Hold'em . In this work, we aim to develop an AI program for an exceptionally complex and popular card game called GuanDan . This game involves four players engaging in both competitive and cooperative play throughout a long process, posing great challenges for AI due to its expansive state and action space, long episode length, and complex rules. Employing reinforcement learning techniques, specifically deep Monte Carlo, and a distributed training framework, we first put forward an AI program named DanZero. Evaluation against baseline AI programs based on heuristic rules highlights the outstanding performance of our bot. Besides, in order to further enhance the AI's capabilities, we apply proximal policy optimization to GuanDan on the basis of Danzero. To address the challenges arising from the huge action space, which will significantly impact the performance of policy-based algorithms, we adopt the pretrained model to compress the action space and integrate action features into the model to bolster its generalization capabilities. Using these techniques, we manage to obtain a new GuanDan AI program DanZero+, which achieves a superior performance compared to DanZero.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Games Engineering-Electrical and Electronic Engineering

CiteScore

4.60

自引率

8.70%

发文量