Policy Transition of Reinforcement Learning for an Agent Based SCM System

2006 4th IEEE International Conference on Industrial Informatics Pub Date : 2006-08-01 DOI:10.1109/INDIN.2006.275663

Gang Zhao, R. Sun

引用次数: 6

Abstract

Reinforcement learning (RL) is successfully applied to some dynamical and unpredictable domains. The Supply Chain Management (SCM) is NP-hard problem. Some proposed RL methods perform better than traditional tools for dynamic problem solving in SCM. It realizes on-line learning and performs efficiently in some applications, but RL agent reacts worse than some heuristic methods to sudden changes in SCM demand since the trial-and-error characteristic of RL is time-consuming in practice. By surveying an efficient policy transition mechanism in RL about how to mapping existing policies in the previous task to a new policies in a changed task, this paper proposes a novel RL agent based SCM system that decreases learning time of the RL agent to a dynamic environment. As the result, the RL agent derives the maximal profit using RL technique as jobs coming with a stable distribution. Further, the RL agent makes the optimal procurement satisfying the requirement of sudden changes in the supply chain network by the policy transition mechanism.

查看原文本刊更多论文

基于Agent的SCM系统强化学习的策略转换

强化学习(RL)成功地应用于一些动态和不可预测的领域。供应链管理是一个np难题。在供应链管理的动态问题求解中，一些强化学习方法的性能优于传统工具。它实现了在线学习，在某些应用中表现得很好，但由于强化学习的试错特性在实践中耗时，它对SCM需求的突然变化的反应比一些启发式方法要差。通过研究强化学习中有效的策略转换机制，即如何将前一个任务中的现有策略映射到变化任务中的新策略，本文提出了一种新的基于强化学习代理的SCM系统，该系统减少了强化学习代理对动态环境的学习时间。结果表明，RL代理将RL技术作为具有稳定分布的工作来获取最大的利润。再进一步，RL agent通过政策转移机制实现满足供应链网络突变需求的最优采购。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2006 4th IEEE International Conference on Industrial Informatics

自引率

0.00%

发文量