多目标在线订单批处理问题的深度强化学习

International Conference on Automated Planning and Scheduling Pub Date : 2022-06-13 DOI:10.1609/icaps.v32i1.19829

M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer

{"title":"多目标在线订单批处理问题的深度强化学习","authors":"M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer","doi":"10.1609/icaps.v32i1.19829","DOIUrl":null,"url":null,"abstract":"On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. \nTo learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. \nWe show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem\",\"authors\":\"M. Beeks, Reza Refaei Afshar, Yingqian Zhang, R. Dijkman, Claudy van Dorst, S. D. Looijer\",\"doi\":\"10.1609/icaps.v32i1.19829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. \\nTo learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. \\nWe show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.\",\"PeriodicalId\":239898,\"journal\":{\"name\":\"International Conference on Automated Planning and Scheduling\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Automated Planning and Scheduling\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/icaps.v32i1.19829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icaps.v32i1.19829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

准时交货和低服务成本是仓储作业中两个重要的绩效指标。本文提出了一种基于深度强化学习(DRL)的方法来解决在线订单批处理和序列问题(OBSP)，以优化这两个目标。为了学习如何平衡两个目标之间的权衡，我们引入了一个贝叶斯优化框架来塑造DRL代理的奖励函数，这样学习对这些目标的影响就会根据不同的环境进行调整。我们将我们的方法与几个启发式方法进行比较，这些启发式方法使用的是实际规模的问题实例，其中每小时有数千个订单动态到达。我们展示了具有贝叶斯优化的近端策略优化(PPO)算法在两个目标的所有测试场景中都优于启发式算法。此外，在不同的场景下，它找到了奖励函数中不同分量的权重，表明它有能力学习如何在不同的环境下设置两个目标的重要性。我们还提供了对学习到的DRL代理的策略分析，其中使用决策树来推断决策规则，以使DRL方法具有可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem

On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Automated Planning and Scheduling

自引率

0.00%

发文量