Yulong Yang, Weihua Cao, Linwei Guo, Chao Gan, Min Wu
{"title":"Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes","authors":"Yulong Yang, Weihua Cao, Linwei Guo, Chao Gan, Min Wu","doi":"10.1109/ICPS58381.2023.10128012","DOIUrl":null,"url":null,"abstract":"High precision modeling in industrial systems is difficult and costly. Model-free intelligent control methods, represented by reinforcement learning, have been applied in industrial systems broadly. The hard evaluated of production states and the low value density of processing data causes sparse rewards, which lead to an insufficient performance of reinforcement learning. To overcome the difficulty of reinforcement learning in sparse reward scenes, a reinforcement learning method with reward shaping and hybrid exploration is proposed. By perfecting the rewards distribution in the state space of environment, the reward shaping can make the state-value estimation of reinforcement learning more accurate. By improving the rewards distribution in time dimension, the hybrid exploration can make the iteration of reinforcement learning more efficient and more stable. Finally, the effectiveness of the proposed method is verified by simulations.","PeriodicalId":426122,"journal":{"name":"2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPS58381.2023.10128012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
High precision modeling in industrial systems is difficult and costly. Model-free intelligent control methods, represented by reinforcement learning, have been applied in industrial systems broadly. The hard evaluated of production states and the low value density of processing data causes sparse rewards, which lead to an insufficient performance of reinforcement learning. To overcome the difficulty of reinforcement learning in sparse reward scenes, a reinforcement learning method with reward shaping and hybrid exploration is proposed. By perfecting the rewards distribution in the state space of environment, the reward shaping can make the state-value estimation of reinforcement learning more accurate. By improving the rewards distribution in time dimension, the hybrid exploration can make the iteration of reinforcement learning more efficient and more stable. Finally, the effectiveness of the proposed method is verified by simulations.