Model Representation Considerations for Artificial Intelligence Opponent Development in Combat Games

The International FLAIRS Conference Proceedings Pub Date : 2023-05-08 DOI:10.32473/flairs.36.133571

Sarah Kitchen, Christopher McGroarty, Timothy Aris

{"title":"Model Representation Considerations for Artificial Intelligence Opponent Development in Combat Games","authors":"Sarah Kitchen, Christopher McGroarty, Timothy Aris","doi":"10.32473/flairs.36.133571","DOIUrl":null,"url":null,"abstract":"The performance and behavior of an Artificial Intelligence (AI) opponent in games requires coordination of multiple agents and complex tasks depends on many design choices made during implementation. Currently, gaming agents developed with Reinforcement Learning (RL) methods are constructed to play the game, leading to natural design choices for observations, actions, and rewards that are congruent with a human player's actions and objectives. However, in simulation and serious games, the objective of the implemented opponent should be developed in a way that supports the learning objectives for the user, such as by including additional ground truth environment data in the observation space or action structure. Therefore, the reward structure for the AI needs to incorporate more sophisticated considerations than just whether the game was won or lost by the AI. In this way the design space for opponent AI in these settings is considerably broader than what is traditionally used for RL gaming AI. This paper considers the implications of observation representation and reward design for the AI agent and associated actions in the context of 2-player battlefield-type games that are not strictly zero-sum. Semi-cooperative and fully competitive models are considered. The environment in these games is a spatially extended battlefield in which agents must maneuver their forces to bring them into combat range of each other. The objective of the game is control over a pre-specified location in the game, and combat is executed via Lanchester attrition. We demonstrate the impact of aggregation on stochasticity of the model, where aggregation of the state model is controlled by various entropy-based metrics, as well as on the policy learned by an RL agent. Generalizations to alternative scenarios and objectives are discussed, as well as applications to the development of an AI combat opponent that can cohesively manage its forces over multiple scales.","PeriodicalId":302103,"journal":{"name":"The International FLAIRS Conference Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International FLAIRS Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32473/flairs.36.133571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The performance and behavior of an Artificial Intelligence (AI) opponent in games requires coordination of multiple agents and complex tasks depends on many design choices made during implementation. Currently, gaming agents developed with Reinforcement Learning (RL) methods are constructed to play the game, leading to natural design choices for observations, actions, and rewards that are congruent with a human player's actions and objectives. However, in simulation and serious games, the objective of the implemented opponent should be developed in a way that supports the learning objectives for the user, such as by including additional ground truth environment data in the observation space or action structure. Therefore, the reward structure for the AI needs to incorporate more sophisticated considerations than just whether the game was won or lost by the AI. In this way the design space for opponent AI in these settings is considerably broader than what is traditionally used for RL gaming AI. This paper considers the implications of observation representation and reward design for the AI agent and associated actions in the context of 2-player battlefield-type games that are not strictly zero-sum. Semi-cooperative and fully competitive models are considered. The environment in these games is a spatially extended battlefield in which agents must maneuver their forces to bring them into combat range of each other. The objective of the game is control over a pre-specified location in the game, and combat is executed via Lanchester attrition. We demonstrate the impact of aggregation on stochasticity of the model, where aggregation of the state model is controlled by various entropy-based metrics, as well as on the policy learned by an RL agent. Generalizations to alternative scenarios and objectives are discussed, as well as applications to the development of an AI combat opponent that can cohesively manage its forces over multiple scales.

查看原文本刊更多论文

格斗游戏中人工智能对手开发的模型表示考虑

游戏中人工智能(AI)对手的表现和行为需要多个代理的协调，复杂的任务取决于执行过程中的许多设计选择。目前，用强化学习(RL)方法开发的游戏代理是为了玩游戏而构建的，导致观察、行动和奖励的自然设计选择与人类玩家的行动和目标一致。然而，在模拟和严肃游戏中，执行对手的目标应该以支持用户学习目标的方式来开发，例如通过在观察空间或动作结构中包含额外的地面真实环境数据。因此，AI的奖励结构需要包含更复杂的考虑因素，而不仅仅是AI赢了还是输了游戏。通过这种方式，对手AI在这些环境中的设计空间要比传统的强化学习游戏AI广泛得多。本文考虑了在非严格零和的2人战场型游戏背景下，人工智能代理及其相关行为的观察表征和奖励设计的含义。考虑了半合作模式和完全竞争模式。这些游戏的环境是一个空间扩展的战场，在这个战场上，代理人必须调动他们的力量，把他们带到彼此的战斗范围内。游戏的目标是控制游戏中预先指定的位置，战斗通过兰彻斯特消耗来执行。我们展示了聚合对模型随机性的影响，其中状态模型的聚合由各种基于熵的度量控制，以及RL代理学习的策略。本文还讨论了可选择场景和目标的概括，以及开发能够在多个尺度上集中管理其力量的AI战斗对手的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The International FLAIRS Conference Proceedings

自引率

0.00%

发文量