{"title":"Implementation of Language-Action Reward Network in Reinforcement Learning by Using Natural Language","authors":"S. Keerthi","doi":"10.1109/INDISCON50162.2020.00026","DOIUrl":null,"url":null,"abstract":"Key issue in many RL approaches is deferred (long-term and postponed) rewards, this leads to difficulties in learning for an agent. Inspite of fact certain “reward shaping” deals with goal of making process of learning quick and easy (by starting process with an agent assigned with extra input), it leads to come complexity. Ongoing RL approaches have indicated complexity of implementation (example Atari games). So, to avoid such issues Potential-based reward shaping (PBRS) is used to increase performance of RL gents. It is an adaptable strategy to provide fundamental foundation data combining with temporal (time-based) distinction learning in a principled manner. In this PBRS approach, we propose a system LEARN (LanguagEAction Reward Network), certain maps common language (human understandable) to middle (intermediate) rewards based on activities of agent. These intermediate language based rewards canister be integrated (combined) into any standard RL algorithm. Experiments abide run on a grid world and a more complex LanguagE-Action Reward Network (LEARN) a framework certain show certain we canister learn tasks significantly faster when we specify intuitive priors on reward distribution.","PeriodicalId":371571,"journal":{"name":"2020 IEEE India Council International Subsections Conference (INDISCON)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE India Council International Subsections Conference (INDISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDISCON50162.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Key issue in many RL approaches is deferred (long-term and postponed) rewards, this leads to difficulties in learning for an agent. Inspite of fact certain “reward shaping” deals with goal of making process of learning quick and easy (by starting process with an agent assigned with extra input), it leads to come complexity. Ongoing RL approaches have indicated complexity of implementation (example Atari games). So, to avoid such issues Potential-based reward shaping (PBRS) is used to increase performance of RL gents. It is an adaptable strategy to provide fundamental foundation data combining with temporal (time-based) distinction learning in a principled manner. In this PBRS approach, we propose a system LEARN (LanguagEAction Reward Network), certain maps common language (human understandable) to middle (intermediate) rewards based on activities of agent. These intermediate language based rewards canister be integrated (combined) into any standard RL algorithm. Experiments abide run on a grid world and a more complex LanguagE-Action Reward Network (LEARN) a framework certain show certain we canister learn tasks significantly faster when we specify intuitive priors on reward distribution.