{"title":"在战斗场景中减轻强化学习代理的怯懦","authors":"Steve Bakos, Heidar Davoudi","doi":"10.1109/CoG51982.2022.9893546","DOIUrl":null,"url":null,"abstract":"A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.","PeriodicalId":394281,"journal":{"name":"2022 IEEE Conference on Games (CoG)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigating Cowardice for Reinforcement Learning Agents in Combat Scenarios\",\"authors\":\"Steve Bakos, Heidar Davoudi\",\"doi\":\"10.1109/CoG51982.2022.9893546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.\",\"PeriodicalId\":394281,\"journal\":{\"name\":\"2022 IEEE Conference on Games (CoG)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Conference on Games (CoG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoG51982.2022.9893546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Games (CoG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoG51982.2022.9893546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mitigating Cowardice for Reinforcement Learning Agents in Combat Scenarios
A common approach in reinforcement learning (RL) is to give the agent a static reward for successfully completing the task or punishing it for failing. However, this approach leads to a behaviour similar to fear in combat scenarios. It learns a sub-optimal policy improving over time while retaining elements of cowardice in updating the policy. Cowardice can be avoided by removing static rewards given to the agent at the terminal state, but this lack of reward can negatively affect performance. This paper presents a novel approach to solve these issues by decaying this reward or punishment based on the agent’s performance at the terminal state and evaluates the proposed method across three separate games of varying levels of complexity—The Legend of Zelda, Megaman X, and M.U.G.E.N. All three games are based on combat scenarios where the goal is to defeat the opponent by reducing its health to zero. In all environments, the agents receiving decayed reward and punishment are more stable when training, achieve higher win rates, and require fewer actions per game than their statically rewarded counterparts.