仓库操作中动态订单拣选的深度强化学习

IF 4.1 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Operations Research Pub Date : 2025-05-08 DOI:10.1016/j.cor.2025.107112

Sasan Mahmoudinazlou , Abhay Sobhanan , Hadi Charkhgard , Ali Eshragh , George Dunn

{"title":"仓库操作中动态订单拣选的深度强化学习","authors":"Sasan Mahmoudinazlou , Abhay Sobhanan , Hadi Charkhgard , Ali Eshragh , George Dunn","doi":"10.1016/j.cor.2025.107112","DOIUrl":null,"url":null,"abstract":"<div><div>Order picking is a pivotal operation in warehouses that directly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management, where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, which often depend on static optimization algorithms designed around fixed order sets for the picker routing, fall short in addressing the challenges of this dynamic environment. To overcome these challenges, we propose a Deep Reinforcement Learning (DRL) framework tailored for single-block warehouses equipped with an autonomous picking device. By dynamically optimizing picker routes, our approach significantly reduces order throughput times and unfulfilled orders, particularly under high order arrival rates. We benchmark our DRL model against established algorithms, utilizing instances generated based on standard practices in the order picking literature. Experimental results demonstrate the superiority of our DRL model over benchmark algorithms. For example, at a high order arrival rate of 0.09 (i.e., 9 orders per 100 units of time on average), our approach achieves an order fulfillment rate of approximately 98%, compared to the 82% fulfillment rate observed with benchmarking algorithms. We further investigate the integration of a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model on out-of-sample test instances.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"182 ","pages":"Article 107112"},"PeriodicalIF":4.1000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep reinforcement learning for dynamic order picking in warehouse operations\",\"authors\":\"Sasan Mahmoudinazlou , Abhay Sobhanan , Hadi Charkhgard , Ali Eshragh , George Dunn\",\"doi\":\"10.1016/j.cor.2025.107112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Order picking is a pivotal operation in warehouses that directly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management, where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, which often depend on static optimization algorithms designed around fixed order sets for the picker routing, fall short in addressing the challenges of this dynamic environment. To overcome these challenges, we propose a Deep Reinforcement Learning (DRL) framework tailored for single-block warehouses equipped with an autonomous picking device. By dynamically optimizing picker routes, our approach significantly reduces order throughput times and unfulfilled orders, particularly under high order arrival rates. We benchmark our DRL model against established algorithms, utilizing instances generated based on standard practices in the order picking literature. Experimental results demonstrate the superiority of our DRL model over benchmark algorithms. For example, at a high order arrival rate of 0.09 (i.e., 9 orders per 100 units of time on average), our approach achieves an order fulfillment rate of approximately 98%, compared to the 82% fulfillment rate observed with benchmarking algorithms. We further investigate the integration of a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model on out-of-sample test instances.</div></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"182 \",\"pages\":\"Article 107112\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054825001406\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825001406","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

拣货是仓库的关键操作，直接影响整体效率和盈利能力。本研究解决了动态订单拣选问题，这是现代仓库管理中一个重要的问题，其中实时适应波动的订单到达和高效的拣选路线是至关重要的。传统的方法通常依赖于围绕固定顺序集设计的静态优化算法，无法解决这种动态环境的挑战。为了克服这些挑战，我们提出了一个深度强化学习（DRL）框架，专门为配备自主拣选设备的单块仓库量身定制。通过动态优化拣选路线，我们的方法显著减少了订单吞吐量时间和未完成订单，特别是在高订单到达率的情况下。我们根据已建立的算法对我们的DRL模型进行基准测试，利用基于顺序选择文献中的标准实践生成的实例。实验结果表明，我们的DRL模型优于基准算法。例如，在0.09的高订单到达率（即平均每100单位时间9个订单）下，我们的方法实现了约98%的订单完成率，而基准算法的完成率为82%。我们进一步研究了奖励函数中一个超参数的积分，它允许在旅行距离和订单完成时间之间实现灵活的平衡。最后，我们证明了我们的DRL模型在样本外测试实例上的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep reinforcement learning for dynamic order picking in warehouse operations

Order picking is a pivotal operation in warehouses that directly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management, where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, which often depend on static optimization algorithms designed around fixed order sets for the picker routing, fall short in addressing the challenges of this dynamic environment. To overcome these challenges, we propose a Deep Reinforcement Learning (DRL) framework tailored for single-block warehouses equipped with an autonomous picking device. By dynamically optimizing picker routes, our approach significantly reduces order throughput times and unfulfilled orders, particularly under high order arrival rates. We benchmark our DRL model against established algorithms, utilizing instances generated based on standard practices in the order picking literature. Experimental results demonstrate the superiority of our DRL model over benchmark algorithms. For example, at a high order arrival rate of 0.09 (i.e., 9 orders per 100 units of time on average), our approach achieves an order fulfillment rate of approximately 98%, compared to the 82% fulfillment rate observed with benchmarking algorithms. We further investigate the integration of a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model on out-of-sample test instances.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.