Reinforcement Learning for Optimizing Can-Order Policy with the Rolling Horizon Method

syst mt`lyh Pub Date : 2023-07-07 DOI:10.3390/systems11070350

J. Noh

引用次数: 0

Abstract

This study presents a novel approach to a mixed-integer linear programming (MILP) model for periodic inventory management that combines reinforcement learning algorithms. The rolling horizon method (RHM) is a multi-period optimization approach that is applied to handle new information in updated markets. The RHM faces a limitation in easily determining a prediction horizon; to overcome this, a dynamic RHM is developed in which RL algorithms optimize the prediction horizon of the RHM. The state vector consisted of the order-up-to-level, real demand, total cost, holding cost, and backorder cost, whereas the action included the prediction horizon and forecasting demand for the next time step. The performance of the proposed model was validated through two experiments conducted in cases with stable and uncertain demand patterns. The results showed the effectiveness of the proposed approach in inventory management, particularly when the proximal policy optimization (PPO) algorithm was used for training compared with other reinforcement learning algorithms. This study signifies important advancements in both the theoretical and practical aspects of multi-item inventory management.

查看原文本刊更多论文

滚动地平线法优化Can-Order策略的强化学习

本研究提出了一种结合强化学习算法的混合整数线性规划(MILP)周期性库存管理模型的新方法。滚动水平法是一种多周期优化方法，用于处理更新市场中的新信息。RHM在容易确定预测范围方面存在局限性;为了克服这个问题，开发了一种动态RHM，其中RL算法优化了RHM的预测范围。状态向量包括到级订单、实际需求、总成本、持有成本和缺货成本，而动作包括预测范围和预测下一个时间步的需求。通过稳定需求模式和不确定需求模式两种情况下的实验，验证了该模型的性能。结果表明，与其他强化学习算法相比，本文提出的方法在库存管理中是有效的，特别是当使用近端策略优化(PPO)算法进行训练时。本研究在多项目库存管理的理论和实践方面都取得了重要进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

syst mt`lyh

自引率

0.00%

发文量

审稿时长

9 weeks