{"title":"Deep Multitask Multiagent Reinforcement Learning With Knowledge Transfer","authors":"Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang","doi":"10.1109/TG.2023.3316697","DOIUrl":"10.1109/TG.2023.3316697","url":null,"abstract":"Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: \u0000<italic>StarCraft II</i>\u0000 micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"566-576"},"PeriodicalIF":1.7,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135554802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Papers—IEEE Transactions on Games Special Issue on Human-Centered AI in Game Evaluation","authors":"","doi":"10.1109/TG.2023.3312909","DOIUrl":"https://doi.org/10.1109/TG.2023.3312909","url":null,"abstract":"","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"15 3","pages":"492-492"},"PeriodicalIF":2.3,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/7782673/10251473/10251484.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68027376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Computational Intelligence Society Information","authors":"","doi":"10.1109/TG.2023.3310831","DOIUrl":"https://doi.org/10.1109/TG.2023.3310831","url":null,"abstract":"","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"15 3","pages":"C3-C3"},"PeriodicalIF":2.3,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/7782673/10251473/10251490.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68027377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Games Publication Information","authors":"","doi":"10.1109/TG.2023.3310833","DOIUrl":"https://doi.org/10.1109/TG.2023.3310833","url":null,"abstract":"","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"15 3","pages":"C2-C2"},"PeriodicalIF":2.3,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/7782673/10251473/10251491.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging the OPT Large Language Model for Sentiment Analysis of Game Reviews","authors":"Markos Viggiato;Cor-Paul Bezemer","doi":"10.1109/TG.2023.3313121","DOIUrl":"10.1109/TG.2023.3313121","url":null,"abstract":"Automatically extracting players' sentiments about games can help game developers to better understand the aspects of their games that players like or dislike. Our prior work showed that traditional sentiment analysis techniques do not perform well on game reviews. However, the natural language processing field has seen a steep progress in recent years. In this letter, we follow up on our prior work and investigate how a state-of-the-art large language model (OPT-175B) performs on the sentiment classification of game reviews. We manually analyze the game reviews wrongly classified by OPT-175B to better understand the issues that affect the performance of that model and how those issues compare to the challenges faced by traditional classifiers. We found that OPT-175B achieves (far) better performance than traditional sentiment classifiers, with a 72%-increased \u0000<inline-formula><tex-math>$F$</tex-math></inline-formula>\u0000-measure and a 30%-increased AUC compared to the best traditional classifier studied in our prior work. We also found that common challenges of traditional classifiers, such as reviews with game comparisons and negative terminology, have been mostly solved by the OPT-175B model.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 2","pages":"493-496"},"PeriodicalIF":2.3,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62570261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Zhao;Mingyu Yang;Youpeng Zhao;Xunhan Hu;Wengang Zhou;Houqiang Li
{"title":"MCMARL: Parameterizing Value Function via Mixture of Categorical Distributions for Multi-Agent Reinforcement Learning","authors":"Jian Zhao;Mingyu Yang;Youpeng Zhao;Xunhan Hu;Wengang Zhou;Houqiang Li","doi":"10.1109/TG.2023.3310150","DOIUrl":"10.1109/TG.2023.3310150","url":null,"abstract":"In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward, and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns, and the randomness can be exacerbated with the increasing number of agents. However, such randomness is ignored by most of the existing value-based multi-agent reinforcement learning (MARL) methods, which only model the expectation of \u0000<inline-formula><tex-math>$Q$</tex-math></inline-formula>\u0000-value for both the individual agents and the team. Compared to using the expectations of the long-term returns, it is preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this article proposes a novel value-based MARL framework from a distributional perspective, i.e., parameterizing value function via \u0000<underline>M</u>\u0000ixture of \u0000<underline>C</u>\u0000ategorical distributions for MARL (MCMARL). Specifically, we model both the individual and global \u0000<inline-formula><tex-math>$Q$</tex-math></inline-formula>\u0000-values with categorical distribution. To integrate categorical distributions, we define five basic operations on the distribution, which allow the generalization of expected value function factorization methods (e.g., value decomposition networks (VDN) and QMIX) to their MCMARL variants. We further prove that our MCMARL framework satisfies the \u0000<italic>Distributional-Individual-Global-Max</i>\u0000 principle with respect to the expectation of distribution, which guarantees the consistency between joint and individual greedy action selections in the global and individual \u0000<inline-formula><tex-math>$Q$</tex-math></inline-formula>\u0000-values. Empirically, we evaluate MCMARL on both the stochastic matrix game and the challenging set of \u0000<italic>StarCraft II</i>\u0000 micromanagement tasks, showing the efficacy of our framework.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"556-565"},"PeriodicalIF":1.7,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47562284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixian Zhang;Zhuoxuan Li;Yiding Cao;Xuan Zhao;Jinde Cao
{"title":"Deep Reinforcement Learning Using Optimized Monte Carlo Tree Search in EWN","authors":"Yixian Zhang;Zhuoxuan Li;Yiding Cao;Xuan Zhao;Jinde Cao","doi":"10.1109/TG.2023.3308898","DOIUrl":"10.1109/TG.2023.3308898","url":null,"abstract":"<italic>EinStein würfelt nicht!</i>\u0000 (EWN) is a perfect information stochastic game, in which randomness influences the game process enormously. In this article, we propose an optimized algorithm named Quick Neural Network Tree Search (QNNTS) based on deep reinforcement learning and Monte Carlo tree search (MCTS) to construct the artificial intelligence agent of EWN. Meanwhile, the lightness of the model makes it possible to train with much less computing resources. The optimization structure of the algorithm based on MCTS is named Optimized Upper Confidence Bound Applied to Tree with Heuristic Search, which introduces the expectation valuation strategy into the MCTS. As the prerequisite product of QNNTS, it performs with an improvement of the winning rate. Ultimately, the Attention-ResNet structure combined with domain knowledge is used to obtain the proposed algorithm. Compared with several conventional algorithms, it gains high winning rates of at least 68%.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"544-555"},"PeriodicalIF":1.7,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62570246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multigoal Reinforcement Learning via Exploring Entropy-Regularized Successor Matching","authors":"Xiaoyun Feng;Yun Zhou","doi":"10.1109/TG.2023.3304315","DOIUrl":"10.1109/TG.2023.3304315","url":null,"abstract":"Multigoal reinforcement learning (RL) algorithms tend to achieve and generalize over diverse goals. However, unlike single-goal agents, multigoal agents struggle to break through the exploration bottleneck with a fair share of interactions, owing to rarely reusable goal-oriented experiences with sparse goal-reaching rewards. Therefore, well-arranged behavior goals during training are essential for multigoal agents, especially in long-horizon tasks. To this end, we propose efficient multigoal exploration on the basis of maximizing the entropy of successor features and Exploring entropy-regularized successor matching, namely, E\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000SM. E\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000SM adopts the idea of a successor feature and extends it to entropy-regularized goal-reaching successor mapping that serves as a more stable state feature under sparse rewards. The key contribution of our work is to perform intrinsic goal setting with behavior goals that are more likely to be achieved in terms of future state occupancies as well as promising in expanding the exploration frontier. Experiments on challenging long-horizon manipulation tasks show that E\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000SM deals well with sparse rewards and in pursuit of maximal state-covering, E\u0000<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>\u0000SM efficiently identifies valuable behavior goals toward specific goal-reaching by matching the successor mapping.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"15 4","pages":"538-548"},"PeriodicalIF":2.3,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62570212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games","authors":"Xingzhou Lou;Junge Zhang;Yali Du;Chao Yu;Zhaofeng He;Kaiqi Huang","doi":"10.1109/TG.2023.3302694","DOIUrl":"10.1109/TG.2023.3302694","url":null,"abstract":"State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the \u0000<italic>StarCraft</i>\u0000 multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 2","pages":"470-482"},"PeriodicalIF":2.3,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62570196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Exploration With Demonstrations in Procedurally-Generated Environments","authors":"Mao Xu;Shuzhi Sam Ge;Dongjie Zhao;Qian Zhao","doi":"10.1109/TG.2023.3299986","DOIUrl":"10.1109/TG.2023.3299986","url":null,"abstract":"Exploring sparse reward environments remains a major challenge in model-free deep reinforcement learning (RL). State-of-the-art exploration methods address this challenge by utilizing intrinsic rewards to guide exploration in uncertain environment dynamics or novel states. However, these methods fall short in procedurally-generated environments, where the agent is unlikely to visit a state more than once due to the different environments generated in each episode. Recently, imitation-learning-based exploration methods have been proposed to guide exploration in different kinds of procedurally-generated environments by imitating high-quality exploration episodes. However, these methods have weaker exploration capabilities and lower sample efficiency in complex procedurally-generated environments. Motivated by the fact that demonstrations can guide exploration in sparse reward environments, we propose improved exploration with demonstrations (IEWD), an improved imitation-learning-based exploration method in procedurally-generated environments, which utilizes demonstrations from these environments. IEWD assigns different episode-level exploration scores to each demonstration episode and generated episode. IEWD then ranks these episodes based on their scores and stores highly-scored episodes into a small ranking buffer. IEWD treats these highly-scored episodes as good exploration episodes and makes the deep RL agent imitate exploration behaviors from the ranking buffer to reproduce exploration behaviors from good exploration episodes. Additionally, IEWD adopts the experience replay buffer to store generated positive episodes and demonstrations and employs self-imitating learning to utilize experiences from the experience replay buffer to optimize the policy of the deep RL agent. We evaluate our method IEWD on several procedurally-generated MiniGrid environments and 3-D maze environments from MiniWorld. The results show that IEWD significantly outperforms existing learning from demonstration methods and exploration methods, including state-of-the-art imitation-learning-based exploration methods, in terms of sample efficiency and final performance in complex procedurally-generated environments.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"530-543"},"PeriodicalIF":1.7,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62570339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}