Ziqi Wang;Yuchen Li;Haocheng Du;Jialin Liu;Georgios N. Yannakakis
{"title":"Fun as Moderate Divergence: Evaluating Experience-Driven PCG via RL","authors":"Ziqi Wang;Yuchen Li;Haocheng Du;Jialin Liu;Georgios N. Yannakakis","doi":"10.1109/TG.2024.3456101","DOIUrl":"10.1109/TG.2024.3456101","url":null,"abstract":"The computational modeling of player experience is key to the generation of personalized game content. The notion of <italic>fun</i>, as one of the most peculiar and core aspects of game experience, has often been modeled and quantified for the purpose of content generation with varying success. Recently, measures of a player's <italic>fun</i> have been ad-hoc designed to model moderate levels of in-game divergence in platformer games, inspired by Koster's <italic>theory of fun</i>. Such measures have shaped the reward functions of game content generative methods following the <italic>experience-driven procedural content generation via reinforcement learning</i> (EDRL) paradigm in <italic>Super Mario Bros</i> In this article, we present a comprehensive user study involving over 90 participants with a dual purpose: to evaluate the ad-hoc <italic>fun</i> metrics introduced in the literature and test the effectiveness of the EDRL framework to generate personalized <italic>fun</i> <italic>Super Mario Bros</i> experiences in an online fashion. Our key findings suggest that moderate degrees of game level and gameplay divergence are highly consistent with the perceived notion of <italic>fun</i> of our participants, cross-verifying the ad-hoc designed <italic>fun</i> metrics. On the other hand, it appears that EDRL generators manage to match the preferred (i.e., <italic>fun</i>) game experiences of each persona, only in part and for some players. Our findings suggest that the use of multifaceted in-game data, such as events and actions, will likely enable the modeling of more nuanced gameplay behaviors. In addition, the verification of player persona modeling and the enhancement of player engagement through dynamic experience modelling are suggested as potential future directions.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"360-373"},"PeriodicalIF":1.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Adversarially Guided Actor–Critic","authors":"Mao Xu;Shuzhi Sam Ge;Dongjie Zhao;Qian Zhao","doi":"10.1109/TG.2024.3453444","DOIUrl":"10.1109/TG.2024.3453444","url":null,"abstract":"Exploring procedurally-generated environments presents a formidable challenge in model-free deep reinforcement learning (RL). One state-of-the-art exploration method, adversarially guided actor–critic (AGAC), employs adversarial learning to drive exploration by diversifying the actions of the deep RL agent. Specifically, in the actor–critic (AC) framework, which consists of a policy (the actor) and a value function (the critic), AGAC introduces an adversary that mimics the actor. AGAC then constructs an action-based adversarial advantage (ABAA) to update the actor. This ABAA guides the deep RL agent toward actions that diverge from the adversary's predictions while maximizing expected returns. Although the ABAA drives AGAC to explore procedurally-generated environments, it can affect the balance between exploration and exploitation during the training period, thereby impairing AGAC's performance. To mitigate this adverse effect and improve AGAC's performance, we propose efficient adversarially guided actor–critic (EAGAC). EAGAC introduces a state-based adversarial advantage (SBAA) that directs the deep RL agent toward actions leading to states with different action distributions from those of the adversary while maximizing expected returns. EAGAC combines this SBAA with the ABAA to form a joint adversarial advantage, and then employs this joint adversarial advantage to update the actor. To further reduce this adverse effect and enhance performance, EAGAC stores past positive episodes in the replay buffer and utilizes experiences sampled from this buffer to optimize the actor through self-imitation learning (SIL). The experimental results in procedurally-generated environments from MiniGrid and the 3-D navigation environment from ViZDoom show our EAGAC method significantly outperforms AGAC and other state-of-the-art exploration methods in both sample efficiency and final performance.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"346-359"},"PeriodicalIF":1.7,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Individual Potential-Based Rewards in Multiagent Reinforcement Learning","authors":"Chen Yang;Pei Xu;Junge Zhang","doi":"10.1109/TG.2024.3450475","DOIUrl":"10.1109/TG.2024.3450475","url":null,"abstract":"A great challenge for applying multiagent reinforcement learning (MARL) in the field of game artificial intelligence (AI) is to enable agents to learn diversified policies to handle different game-specific problems, while receiving only a shared team reward. At present, a common approach is reward shaping, which focuses on designing rewards for agents to guide cooperation. However, most of the existing methods require prior knowledge on the environment for reward design or alter the optimal policies after imposing extra rewards. Besides, previous MARL methods that rely on manually designed rewards can hardly generalize across different game environments. To this end, we propose a new MARL method that learns individual potential-based rewards for agents. Specifically, we learn a parameterized potential function for each agent to generate individual rewards in the discounted temporal difference form. The whole update procedure is modeled as the bilevel optimization problem, where the lower level is to optimize policies with potential-based rewards, and the upper level is to optimize parameterized potential functions toward maximizing the environment return. We theoretically prove that the individual potential-based rewards can guarantee policy invariance for agents, so that the optimization objective is consistent with the original MARL problem. We evaluate our method with a number of existing state-of-the-art MARL methods on predator–prey and <italic>StarCraft II</i> game environments. Empirical results show that our proposed method significantly outperforms baseline methods and achieves better game AI that enjoys high performance and generalization.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"334-345"},"PeriodicalIF":1.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George E.M. Long;Diego Perez-Liebana;Spyridon Samothrakis
{"title":"STEP: A Framework for Automated Point Cost Estimation","authors":"George E.M. Long;Diego Perez-Liebana;Spyridon Samothrakis","doi":"10.1109/TG.2024.3450992","DOIUrl":"10.1109/TG.2024.3450992","url":null,"abstract":"In miniature wargames, such as \u0000<italic>Warhammer 40k</i>\u0000, players control asymmetrical armies, which include multiple units of different types and strengths. These games often use point costs to balance the armies. Each unit is assigned a point cost, and players have a budget they can spend on units. Calculating accurate point costs can be a tedious manual process, with iterative playtests required. If these point costs do not represent a units true power, the game can get unbalanced as overpowered units can have low point costs. In our previous paper, we proposed an automated way of estimating the point costs using a linear regression approach. We used a turn-based asymmetrical wargame called \u0000<italic>Wizard Wars</i>\u0000 to test our methods. Players were simulated using Monte Carlo tree search, using different heuristics to represent playstyles. We presented six variants of our method, and show that one method was able to reduce the unbalanced nature of the game by almost half. For this article, we introduce a framework called simple testing and evaluation of points, which allows for further and more granular analysis of point cost estimating methods, by providing a fast, simple, and configurable framework to test methods with. Finally, we compare how our methods do in \u0000<italic>Wizard Wars</i>\u0000 against expertly chosen point costs.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"927-936"},"PeriodicalIF":1.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monica Villanueva Aylagas;Joakim Bergdahl;Jonas Gillberg;Alessandro Sestini;Theodor Tolstoy;Linus Gisslén
{"title":"Improving Conditional Level Generation Using Automated Validation in Match-3 Games","authors":"Monica Villanueva Aylagas;Joakim Bergdahl;Jonas Gillberg;Alessandro Sestini;Theodor Tolstoy;Linus Gisslén","doi":"10.1109/TG.2024.3440214","DOIUrl":"10.1109/TG.2024.3440214","url":null,"abstract":"Generative models for level generation have shown great potential in game production. However, they often provide limited control over the generation, and the validity of the generated levels is unreliable. Despite this fact, only a few approaches that learn from existing data provide the users with ways of controlling the generation, simultaneously addressing the generation of unsolvable levels. This article proposes autovalidated level generation, a novel method to improve models that learn from existing level designs using difficulty statistics extracted from gameplay. In particular, we use a conditional variational autoencoder to generate layouts for match-3 levels, conditioning the model on precollected statistics, such as game mechanics like difficulty, and relevant visual features, such as size and symmetry. Our method is general enough that multiple approaches could potentially be used to generate these statistics. We quantitatively evaluate our approach by comparing it to an ablated model without difficulty conditioning. In addition, we analyze both quantitatively and qualitatively whether the style of the dataset is preserved in the generated levels. Our approach generates more valid levels than the same method without difficulty conditioning.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"783-792"},"PeriodicalIF":1.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Yu;Likun Yang;Zhourui Guo;Yongjian Ren;Qiyue Yin;Junge Zhang;Kaiqi Huang
{"title":"Relation-Aware Learning for Multitask Multiagent Cooperative Games","authors":"Yang Yu;Likun Yang;Zhourui Guo;Yongjian Ren;Qiyue Yin;Junge Zhang;Kaiqi Huang","doi":"10.1109/TG.2024.3436871","DOIUrl":"10.1109/TG.2024.3436871","url":null,"abstract":"Collaboration among multiple tasks is advantageous for enhancing learning efficiency in multiagent reinforcement learning. To guide agents in cooperating with different teammates in multiple tasks, contemporary approaches encourage agents to exploit common cooperative patterns or identify the learning priorities of multiple tasks. Despite the progress made by these methods, they all assume that all cooperative tasks to be learned are related and desire similar agent policies. This is rarely the case in multiagent cooperation, where minor changes in team composition can lead to significant variations in cooperation, resulting in distinct cooperative strategies compete for limited learning resources. In this article, to tackle the challenge posed by multitask learning in potentially competing cooperative tasks, we propose a novel framework called relation-aware learning (RAL). RAL incorporates a relation awareness module in both task representation and task optimization, aiding in reasoning about task relationships and mitigating negative transfers among dissimilar tasks. To assess the performance of RAL, we conduct a comparative analysis with baseline methods in a multitask <italic>StarCraft</i> environment. The results demonstrate the superiority of RAL in multitask cooperative scenarios, particularly in scenarios involving multiple conflicting tasks.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"322-333"},"PeriodicalIF":1.7,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Discrepancies Between Subtitles and Audio in Gameplay Videos With EchoTest","authors":"Ian Gauk;Cor-Paul Bezemer","doi":"10.1109/TG.2024.3435799","DOIUrl":"10.1109/TG.2024.3435799","url":null,"abstract":"The landscape of accessibility features in video games remains inconsistent, posing challenges for gamers who seek experiences tailored to their needs. Accessibility features, such as subtitles are widely used by players but are difficult to test manually due to the large scope of games and the variability in how subtitles can appear. In this article, we introduce an automated approach (<sc>EchoTest</small>) to extract subtitles and spoken audio from a gameplay video, convert them into text, and compare them to detect discrepancies, such as typos, desynchronization, and missing text. <sc>EchoTest</small> can be used by game developers to identify discrepancies between subtitles and spoken audio in their games, enabling them to better test the accessibility of their games. In an empirical study on gameplay videos from 15 popular games, <sc>EchoTest</small> can verify discrepancies between subtitles and audio with a precision of 98% and a recall of 89%. In addition, <sc>EchoTest</small> performs well with a precision of 73% and a recall of 99% on a challenging generated benchmark.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 1","pages":"224-234"},"PeriodicalIF":1.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biosignal Contrastive Representation Learning for Emotion Recognition of Game Users","authors":"Rongyang Li;Jianguo Ding;Huansheng Ning","doi":"10.1109/TG.2024.3435339","DOIUrl":"10.1109/TG.2024.3435339","url":null,"abstract":"Biosignal representation learning (BRL) plays a crucial role in emotion recognition for game users (ERGU). Unsupervised BRL has garnered attention considering the difficulty in obtaining ground truth emotion labels from game users. However, unsupervised BRL in ERGU faces challenges, including overfitting caused by limited data and performance degradation due to unbalanced sample distributions. Faced with the above challenges, we propose a novel method of biosignal contrastive representation learning (BCRL) for ERGU, which not only serves as a unified representation learning approach applicable to various modalities of biosignals but also derives generalized biosignals representations suitable for different downstream tasks. Specifically, we solve the overfitting by introducing perturbations at the embedding layer based on the projected gradient descent (PGD) adversarial attacks and develop the sample balancing strategy (SBS) to mitigate the negative impact of the unbalanced sample on the performance. Further, we have conducted comprehensive validation experiments on the public dataset, yielding the following key observations: first BCRL outperforms all other methods, achieving average accuracies of 76.67%, 71.83%, and 63.58% in 1D-2 C Valence, 1D-2 C Arousal, and 2D-4 C Valence/Arousal, respectively; second, the ablation study shows that both the PGD module (+7.58% in accuracy on average) and the SBS module (+14.60% in accuracy on average) have a positive effect on the performance of different classifications; third, BCRL model exhibits the certain generalization ability across the different games, subjects and classifiers.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"308-321"},"PeriodicalIF":1.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"More Human-Like Gameplay by Blending Policies From Supervised and Reinforcement Learning","authors":"Tatsuyoshi Ogawa;Chu-Hsuan Hsueh;Kokolo Ikeda","doi":"10.1109/TG.2024.3424668","DOIUrl":"10.1109/TG.2024.3424668","url":null,"abstract":"Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better human–computer interaction, researchers have tried various methods to create human-like artificial intelligence. In chess and \u0000<italic>Go</i>\u0000, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., \u0000<italic>Shogi</i>\u0000), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning. We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZero-like reinforcement learning policy. Experiments on \u0000<italic>Shogi</i>\u0000 showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and \u0000<italic>Go</i>\u0000 with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"831-843"},"PeriodicalIF":1.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10595450","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Gutiérrez-Sánchez;Marco A. Gómez-Martín;Pedro A. González-Calero;Pedro P. Gómez-Martín
{"title":"A Progress-Based Algorithm for Interpretable Reinforcement Learning in Regression Testing","authors":"Pablo Gutiérrez-Sánchez;Marco A. Gómez-Martín;Pedro A. González-Calero;Pedro P. Gómez-Martín","doi":"10.1109/TG.2024.3426601","DOIUrl":"10.1109/TG.2024.3426601","url":null,"abstract":"In video games, the validation of design specifications throughout the development process poses a major challenge as the project grows in complexity and scale and purely manual testing becomes very costly. This article proposes a new approach to design validation regression testing based on a reinforcement learning technique guided by tasks expressed in a formal logic specification language (truncated linear temporal logic) and the progress made in completing these tasks. This requires no prior knowledge of machine learning to train testing bots, is naturally interpretable and debuggable, and produces dense reward functions without the need for reward shaping. We investigate the validity of our strategy by comparing it to an imitation baseline in experiments organized around three use cases of typical scenarios in commercial video games on a 3-D stealth testing environment created in unity. For each scenario, we analyze the agents' reactivity to modifications in common assets to accommodate design needs in other sections of the game, and their ability to report unexpected gameplay variations. Our experiments demonstrate the practicality of our approach for training bots to conduct automated regression testing in complex video game settings.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 4","pages":"844-853"},"PeriodicalIF":1.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10595449","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141614866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}