Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes

Yulong Yang, Weihua Cao, Linwei Guo, Chao Gan, Min Wu
{"title":"Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes","authors":"Yulong Yang, Weihua Cao, Linwei Guo, Chao Gan, Min Wu","doi":"10.1109/ICPS58381.2023.10128012","DOIUrl":null,"url":null,"abstract":"High precision modeling in industrial systems is difficult and costly. Model-free intelligent control methods, represented by reinforcement learning, have been applied in industrial systems broadly. The hard evaluated of production states and the low value density of processing data causes sparse rewards, which lead to an insufficient performance of reinforcement learning. To overcome the difficulty of reinforcement learning in sparse reward scenes, a reinforcement learning method with reward shaping and hybrid exploration is proposed. By perfecting the rewards distribution in the state space of environment, the reward shaping can make the state-value estimation of reinforcement learning more accurate. By improving the rewards distribution in time dimension, the hybrid exploration can make the iteration of reinforcement learning more efficient and more stable. Finally, the effectiveness of the proposed method is verified by simulations.","PeriodicalId":426122,"journal":{"name":"2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPS58381.2023.10128012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

High precision modeling in industrial systems is difficult and costly. Model-free intelligent control methods, represented by reinforcement learning, have been applied in industrial systems broadly. The hard evaluated of production states and the low value density of processing data causes sparse rewards, which lead to an insufficient performance of reinforcement learning. To overcome the difficulty of reinforcement learning in sparse reward scenes, a reinforcement learning method with reward shaping and hybrid exploration is proposed. By perfecting the rewards distribution in the state space of environment, the reward shaping can make the state-value estimation of reinforcement learning more accurate. By improving the rewards distribution in time dimension, the hybrid exploration can make the iteration of reinforcement learning more efficient and more stable. Finally, the effectiveness of the proposed method is verified by simulations.
稀疏奖励场景下奖励塑造与混合探索的强化学习
在工业系统中,高精度建模是困难且昂贵的。以强化学习为代表的无模型智能控制方法在工业系统中得到了广泛的应用。生产状态的难以评估和处理数据的低值密度导致奖励稀疏,导致强化学习的性能不足。为了克服稀疏奖励场景下强化学习的困难,提出了一种奖励塑造与混合探索的强化学习方法。通过完善环境状态空间中的奖励分布,奖励塑造可以使强化学习的状态值估计更加准确。通过改进奖励在时间维度上的分布,混合探索可以提高强化学习的迭代效率和稳定性。最后,通过仿真验证了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信