{"title":"通过行为心理学启发的可变奖励方案提高强化学习性能","authors":"Heena Rathore, Henry Griffith","doi":"10.1109/SMARTCOMP58114.2023.00050","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) algorithms employ a fixed-ratio schedule which can lead to overfitting, where the agent learns to optimize for the specific rewards it receives, rather than learning the underlying task. Further, the agent can simply repeat the same actions that have worked in the past and do not explore different actions and strategies to see what works best. This leads to generalization issue, where the agent struggles to apply what it has learned to new, unseen situations. This can be particularly problematic in complex environments where the agent needs to learn to generalize from limited data. Introducing variable reward schedules in RL inspired from behavioral psychology can be more effective than traditional reward schemes because they can mimic real-world environments where rewards are not always consistent or predictable. This can also encourage an RL agent to explore more and become more adaptable to changes in the environment. The simulation results showed that variable reward scheme has faster learning rate as compared to fixed rewards.","PeriodicalId":163556,"journal":{"name":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improving Reinforcement Learning Performance through a Behavioral Psychology-Inspired Variable Reward Scheme\",\"authors\":\"Heena Rathore, Henry Griffith\",\"doi\":\"10.1109/SMARTCOMP58114.2023.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) algorithms employ a fixed-ratio schedule which can lead to overfitting, where the agent learns to optimize for the specific rewards it receives, rather than learning the underlying task. Further, the agent can simply repeat the same actions that have worked in the past and do not explore different actions and strategies to see what works best. This leads to generalization issue, where the agent struggles to apply what it has learned to new, unseen situations. This can be particularly problematic in complex environments where the agent needs to learn to generalize from limited data. Introducing variable reward schedules in RL inspired from behavioral psychology can be more effective than traditional reward schemes because they can mimic real-world environments where rewards are not always consistent or predictable. This can also encourage an RL agent to explore more and become more adaptable to changes in the environment. The simulation results showed that variable reward scheme has faster learning rate as compared to fixed rewards.\",\"PeriodicalId\":163556,\"journal\":{\"name\":\"2023 IEEE International Conference on Smart Computing (SMARTCOMP)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Smart Computing (SMARTCOMP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMARTCOMP58114.2023.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP58114.2023.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Reinforcement Learning Performance through a Behavioral Psychology-Inspired Variable Reward Scheme
Reinforcement learning (RL) algorithms employ a fixed-ratio schedule which can lead to overfitting, where the agent learns to optimize for the specific rewards it receives, rather than learning the underlying task. Further, the agent can simply repeat the same actions that have worked in the past and do not explore different actions and strategies to see what works best. This leads to generalization issue, where the agent struggles to apply what it has learned to new, unseen situations. This can be particularly problematic in complex environments where the agent needs to learn to generalize from limited data. Introducing variable reward schedules in RL inspired from behavioral psychology can be more effective than traditional reward schemes because they can mimic real-world environments where rewards are not always consistent or predictable. This can also encourage an RL agent to explore more and become more adaptable to changes in the environment. The simulation results showed that variable reward scheme has faster learning rate as compared to fixed rewards.