物流走廊的容量规划：动态随机时间仓包装问题的深度强化学习

IF 8.3 1区工程技术 Q1 ECONOMICS

Transportation Research Part E-Logistics and Transportation Review Pub Date : 2024-08-31 DOI:10.1016/j.tre.2024.103742

{"title":"物流走廊的容量规划：动态随机时间仓包装问题的深度强化学习","authors":"","doi":"10.1016/j.tre.2024.103742","DOIUrl":null,"url":null,"abstract":"<div><p>This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container <em>delays</em>, and the continuous-time, or <em>dynamic</em>, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.</p></div>","PeriodicalId":49418,"journal":{"name":"Transportation Research Part E-Logistics and Transportation Review","volume":null,"pages":null},"PeriodicalIF":8.3000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1366554524003338/pdfft?md5=772c954521a957892fdb831dda89545d&pid=1-s2.0-S1366554524003338-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Capacity planning in logistics corridors: Deep reinforcement learning for the dynamic stochastic temporal bin packing problem\",\"authors\":\"\",\"doi\":\"10.1016/j.tre.2024.103742\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container <em>delays</em>, and the continuous-time, or <em>dynamic</em>, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.</p></div>\",\"PeriodicalId\":49418,\"journal\":{\"name\":\"Transportation Research Part E-Logistics and Transportation Review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1366554524003338/pdfft?md5=772c954521a957892fdb831dda89545d&pid=1-s2.0-S1366554524003338-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part E-Logistics and Transportation Review\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1366554524003338\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part E-Logistics and Transportation Review","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1366554524003338","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了在基于走廊的物流系统中，如何管理码头日常运力规划中的不确定性。基于走廊的物流系统促进了两个不同区域之间的货物交换，通常涉及工业和物流集群。在这种情况下，我们引入了动态随机时间仓包装问题。它模拟了在离散时间单位内将单个集装箱实时分配给承运商卡车的过程。我们将其表述为马尔可夫决策过程（MDP）。我们的问题有两个显著特点：一是与时间相关的集装箱可用性的随机性，即集装箱延迟；二是规划的连续时间或动态性，即在规划范围内的任何时刻都可能发生集装箱公告。我们介绍了一种基于深度强化学习（DRL）方法近端策略优化（PPO）的创新型实时规划算法，用于将单个集装箱实时分配给符合条件的承运商。此外，我们还提出了一些实用的启发式方法和两种基于（随机）混合整数编程（MIP）的新型滚动视距批量规划方法，这些方法可以被解释为计算信息松弛边界，因为它们会延迟决策。结果表明，与基于随机 MIP 的方法相比，我们提出的 DRL 方法优于实用启发式方法，并能有效地扩展到更大规模的问题，使我们的 DRL 方法成为一种具有实际吸引力的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Capacity planning in logistics corridors: Deep reinforcement learning for the dynamic stochastic temporal bin packing problem

This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container delays, and the continuous-time, or dynamic, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Part E-Logistics and Transportation Review 工程技术-工程：土木

CiteScore

16.20

自引率

16.00%

发文量

285

审稿时长

62 days

期刊介绍： Transportation Research Part E: Logistics and Transportation Review is a reputable journal that publishes high-quality articles covering a wide range of topics in the field of logistics and transportation research. The journal welcomes submissions on various subjects, including transport economics, transport infrastructure and investment appraisal, evaluation of public policies related to transportation, empirical and analytical studies of logistics management practices and performance, logistics and operations models, and logistics and supply chain management. Part E aims to provide informative and well-researched articles that contribute to the understanding and advancement of the field. The content of the journal is complementary to other prestigious journals in transportation research, such as Transportation Research Part A: Policy and Practice, Part B: Methodological, Part C: Emerging Technologies, Part D: Transport and Environment, and Part F: Traffic Psychology and Behaviour. Together, these journals form a comprehensive and cohesive reference for current research in transportation science.