基于Mix-PPO的四人麻将不完全信息博弈研究

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Games Pub Date : 2024-12-04 DOI:10.1109/TG.2024.3507107

Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong

{"title":"基于Mix-PPO的四人麻将不完全信息博弈研究","authors":"Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong","doi":"10.1109/TG.2024.3507107","DOIUrl":null,"url":null,"abstract":"In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including <italic>Go and <italic>MOBA games and other industrial fields. <italic>Mahjong is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese <italic>Mahjong. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player <italic>Mahjong in building the decision-making model of Chinese four-player <italic>Mahjong, as well as the validity of the coding method of the model's <italic>Mahjong feature and the training strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"485-497"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10776753","citationCount":"0","resultStr":"{\"title\":\"Research on the Imperfect Information Game of Four-Player Mahjong Based on Mix-PPO\",\"authors\":\"Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong\",\"doi\":\"10.1109/TG.2024.3507107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including <italic>Go and <italic>MOBA games and other industrial fields. <italic>Mahjong is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese <italic>Mahjong. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player <italic>Mahjong in building the decision-making model of Chinese four-player <italic>Mahjong, as well as the validity of the coding method of the model's <italic>Mahjong feature and the training strategy.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"17 2\",\"pages\":\"485-497\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10776753\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10776753/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10776753/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度强化学习方法在许多具有挑战性的任务中表现良好，包括围棋和MOBA游戏等工业领域。麻将是一种信息不完全的热门游戏，但由于其隐藏信息量大，游戏规则复杂，解决其游戏智能决策问题，构建超越人类水平的人工智能是非常具有挑战性的。针对上述问题，本文提出了一种针对中国麻将四种玩家的特征编码方法和模型训练策略。此外，本文还创新性地提出了Mix-PPO算法，该算法结合了传统PPO1算法和PPO2算法的优点，并将Mix-PPO算法与传统的近端策略优化算法、深度学习相关算法、博弈搜索树算法等其他算法进行了比较。实验结果证明了中国四人麻将特征编码和Mix-PPO算法在构建中国四人麻将决策模型中的有效性，以及模型麻将特征编码方法和训练策略的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on the Imperfect Information Game of Four-Player Mahjong Based on Mix-PPO

In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including Go and MOBA games and other industrial fields. Mahjong is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese Mahjong. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player Mahjong in building the decision-making model of Chinese four-player Mahjong, as well as the validity of the coding method of the model's Mahjong feature and the training strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Games Engineering-Electrical and Electronic Engineering

CiteScore

4.60

自引率

8.70%

发文量