{"title":"在具有任意不确定性的情况下使用奖励期望的强化学习","authors":"Yubin Wang, Yifeng Sun, Jiang Wu, Hao Hu, Zhiqiang Wu, Weigui Huang","doi":"10.1109/CoG51982.2022.9893651","DOIUrl":null,"url":null,"abstract":"In scenarios with aleatoric uncertainties, the reward got by an agent when executing the same action in the same state is random, which can reduce the stability and convergence speed of the reinforcement algorithms. However, in most scenarios, reward functions have regularity, and their expectations are determined, which can be got through models or sample statistics. This paper discusses the distribution relationship between reward functions and value functions in scenarios with aleatoric uncertainties and proves the feasibility of using reward expectations for reinforcement learning. Finally, experiments show that algorithms have better stability and convergence speed when using reward expectations than random rewards.","PeriodicalId":394281,"journal":{"name":"2022 IEEE Conference on Games (CoG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning using Reward Expectations in Scenarios with Aleatoric Uncertainties\",\"authors\":\"Yubin Wang, Yifeng Sun, Jiang Wu, Hao Hu, Zhiqiang Wu, Weigui Huang\",\"doi\":\"10.1109/CoG51982.2022.9893651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In scenarios with aleatoric uncertainties, the reward got by an agent when executing the same action in the same state is random, which can reduce the stability and convergence speed of the reinforcement algorithms. However, in most scenarios, reward functions have regularity, and their expectations are determined, which can be got through models or sample statistics. This paper discusses the distribution relationship between reward functions and value functions in scenarios with aleatoric uncertainties and proves the feasibility of using reward expectations for reinforcement learning. Finally, experiments show that algorithms have better stability and convergence speed when using reward expectations than random rewards.\",\"PeriodicalId\":394281,\"journal\":{\"name\":\"2022 IEEE Conference on Games (CoG)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Conference on Games (CoG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoG51982.2022.9893651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Games (CoG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoG51982.2022.9893651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning using Reward Expectations in Scenarios with Aleatoric Uncertainties
In scenarios with aleatoric uncertainties, the reward got by an agent when executing the same action in the same state is random, which can reduce the stability and convergence speed of the reinforcement algorithms. However, in most scenarios, reward functions have regularity, and their expectations are determined, which can be got through models or sample statistics. This paper discusses the distribution relationship between reward functions and value functions in scenarios with aleatoric uncertainties and proves the feasibility of using reward expectations for reinforcement learning. Finally, experiments show that algorithms have better stability and convergence speed when using reward expectations than random rewards.