Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes

2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS) Pub Date : 2023-05-08 DOI:10.1109/ICPS58381.2023.10128012

Yulong Yang, Weihua Cao, Linwei Guo, Chao Gan, Min Wu

引用次数: 0

Abstract

High precision modeling in industrial systems is difficult and costly. Model-free intelligent control methods, represented by reinforcement learning, have been applied in industrial systems broadly. The hard evaluated of production states and the low value density of processing data causes sparse rewards, which lead to an insufficient performance of reinforcement learning. To overcome the difficulty of reinforcement learning in sparse reward scenes, a reinforcement learning method with reward shaping and hybrid exploration is proposed. By perfecting the rewards distribution in the state space of environment, the reward shaping can make the state-value estimation of reinforcement learning more accurate. By improving the rewards distribution in time dimension, the hybrid exploration can make the iteration of reinforcement learning more efficient and more stable. Finally, the effectiveness of the proposed method is verified by simulations.

查看原文本刊更多论文

稀疏奖励场景下奖励塑造与混合探索的强化学习

在工业系统中，高精度建模是困难且昂贵的。以强化学习为代表的无模型智能控制方法在工业系统中得到了广泛的应用。生产状态的难以评估和处理数据的低值密度导致奖励稀疏，导致强化学习的性能不足。为了克服稀疏奖励场景下强化学习的困难，提出了一种奖励塑造与混合探索的强化学习方法。通过完善环境状态空间中的奖励分布，奖励塑造可以使强化学习的状态值估计更加准确。通过改进奖励在时间维度上的分布，混合探索可以提高强化学习的迭代效率和稳定性。最后，通过仿真验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 6th International Conference on Industrial Cyber-Physical Systems (ICPS)

自引率

0.00%

发文量