基于Mix-PPO的四人麻将不完全信息博弈研究

IF 2.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong
{"title":"基于Mix-PPO的四人麻将不完全信息博弈研究","authors":"Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong","doi":"10.1109/TG.2024.3507107","DOIUrl":null,"url":null,"abstract":"In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including <italic>Go</i> and <italic>MOBA</i> games and other industrial fields. <italic>Mahjong</i> is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese <italic>Mahjong</i>. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player <italic>Mahjong</i> in building the decision-making model of Chinese four-player <italic>Mahjong</i>, as well as the validity of the coding method of the model's <italic>Mahjong</i> feature and the training strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"485-497"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10776753","citationCount":"0","resultStr":"{\"title\":\"Research on the Imperfect Information Game of Four-Player Mahjong Based on Mix-PPO\",\"authors\":\"Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong\",\"doi\":\"10.1109/TG.2024.3507107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including <italic>Go</i> and <italic>MOBA</i> games and other industrial fields. <italic>Mahjong</i> is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese <italic>Mahjong</i>. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player <italic>Mahjong</i> in building the decision-making model of Chinese four-player <italic>Mahjong</i>, as well as the validity of the coding method of the model's <italic>Mahjong</i> feature and the training strategy.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"17 2\",\"pages\":\"485-497\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10776753\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10776753/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10776753/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

近年来,深度强化学习方法在许多具有挑战性的任务中表现良好,包括围棋和MOBA游戏等工业领域。麻将是一种信息不完全的热门游戏,但由于其隐藏信息量大,游戏规则复杂,解决其游戏智能决策问题,构建超越人类水平的人工智能是非常具有挑战性的。针对上述问题,本文提出了一种针对中国麻将四种玩家的特征编码方法和模型训练策略。此外,本文还创新性地提出了Mix-PPO算法,该算法结合了传统PPO1算法和PPO2算法的优点,并将Mix-PPO算法与传统的近端策略优化算法、深度学习相关算法、博弈搜索树算法等其他算法进行了比较。实验结果证明了中国四人麻将特征编码和Mix-PPO算法在构建中国四人麻将决策模型中的有效性,以及模型麻将特征编码方法和训练策略的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on the Imperfect Information Game of Four-Player Mahjong Based on Mix-PPO
In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including Go and MOBA games and other industrial fields. Mahjong is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese Mahjong. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player Mahjong in building the decision-making model of Chinese four-player Mahjong, as well as the validity of the coding method of the model's Mahjong feature and the training strategy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Games
IEEE Transactions on Games Engineering-Electrical and Electronic Engineering
CiteScore
4.60
自引率
8.70%
发文量
87
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信