{"title":"Research on the Imperfect Information Game of Four-Player Mahjong Based on Mix-PPO","authors":"Jia-Yang Wang;Ming-Yan Wang;Wang Zeng;Zi-An Zhong","doi":"10.1109/TG.2024.3507107","DOIUrl":null,"url":null,"abstract":"In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including <italic>Go</i> and <italic>MOBA</i> games and other industrial fields. <italic>Mahjong</i> is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese <italic>Mahjong</i>. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player <italic>Mahjong</i> in building the decision-making model of Chinese four-player <italic>Mahjong</i>, as well as the validity of the coding method of the model's <italic>Mahjong</i> feature and the training strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"485-497"},"PeriodicalIF":2.8000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10776753","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10776753/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the deep reinforcement learning method has performed well in many challenging tasks, including Go and MOBA games and other industrial fields. Mahjong is a popular game with imperfect information, but because of its large amount of hidden information and complex game rules, it is very challenging to solve its game intelligence decision problem and build artificial intelligence beyond the human level. To solve the aforementioned problems, this article proposes a feature encoding method and model training strategy for four players in Chinese Mahjong. In addition, this article also innovatively proposes the Mix-PPO algorithm, which combines the advantages of the traditional PPO1 algorithm and the PPO2 algorithm, and compares the Mix-PPO algorithm with other algorithms, including the traditional proximal policy optimization algorithm, the deep-learning-related algorithm, and the game search tree algorithm. The experimental results show the validity of the feature coding and the Mix-PPO algorithm of Chinese four-player Mahjong in building the decision-making model of Chinese four-player Mahjong, as well as the validity of the coding method of the model's Mahjong feature and the training strategy.