在战斗场景中减轻强化学习代理的怯懦

Steve Bakos, Heidar Davoudi
{"title":"在战斗场景中减轻强化学习代理的怯懦","authors":"Steve Bakos, Heidar Davoudi","doi":"10.1109/CoG51982.2022.9893546","DOIUrl":null,"url":null,"abstract":"A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.","PeriodicalId":394281,"journal":{"name":"2022 IEEE Conference on Games (CoG)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigating Cowardice for Reinforcement Learning Agents in Combat Scenarios\",\"authors\":\"Steve Bakos, Heidar Davoudi\",\"doi\":\"10.1109/CoG51982.2022.9893546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.\",\"PeriodicalId\":394281,\"journal\":{\"name\":\"2022 IEEE Conference on Games (CoG)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Conference on Games (CoG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoG51982.2022.9893546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Games (CoG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoG51982.2022.9893546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)中的一种常见方法是对成功完成任务的智能体给予静态奖励或对失败的智能体进行惩罚。然而,这种方法会导致类似于战斗场景中的恐惧行为。它学习了一个次优策略,随着时间的推移而改进,同时在更新策略时保留了怯懦的成分。可以通过在终端状态移除给予代理的静态奖励来避免怯懦,但是这种奖励的缺乏会对性能产生负面影响。本文提出了一种解决这些问题的新方法,即根据代理在终端状态的表现来减少奖励或惩罚,并在三款不同复杂程度的独立游戏(《塞尔达传说》、《Megaman X》和《M.U.G.E.N.》)中评估所提出的方法。这三款游戏都基于战斗场景,其目标是通过将对手的生命值降至零来击败对手。在所有环境中,与静态奖励的代理相比,接受衰减奖惩的代理在训练时更稳定,获得更高的胜率,每场游戏需要的行动更少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Mitigating Cowardice for Reinforcement Learning Agents in Combat Scenarios
A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信