偶尔使用两阶段深度强化学习的司机分配——一个电子杂货平台的案例研究

IF 3.3 Q3 TRANSPORTATION

Case Studies on Transport Policy Pub Date : 2025-09-16 DOI:10.1016/j.cstp.2025.101606

Nguyen Thi Tam Thanh, Nguyen Van Hop

{"title":"偶尔使用两阶段深度强化学习的司机分配——一个电子杂货平台的案例研究","authors":"Nguyen Thi Tam Thanh, Nguyen Van Hop","doi":"10.1016/j.cstp.2025.101606","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.</div></div>","PeriodicalId":46989,"journal":{"name":"Case Studies on Transport Policy","volume":"22 ","pages":"Article 101606"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform\",\"authors\":\"Nguyen Thi Tam Thanh, Nguyen Van Hop\",\"doi\":\"10.1016/j.cstp.2025.101606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.</div></div>\",\"PeriodicalId\":46989,\"journal\":{\"name\":\"Case Studies on Transport Policy\",\"volume\":\"22 \",\"pages\":\"Article 101606\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Case Studies on Transport Policy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213624X25002433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Case Studies on Transport Policy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213624X25002433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了一个随机的人群运输问题，考虑了多种客户类型和偶尔司机的不确定可用性。目标是通过有效的客户分配和专用和临时司机的路线来最小化总交付成本。为了有效地处理大规模实例，提出了一种新的两阶段深度强化学习方法。在第一阶段，实现单层前馈神经网络，使用on-policy方法训练和验证随机序列的估计奖励函数。通过使用混合整数规划解决方案采取适当的操作来对客户类型进行分类。然后，在训练过程中，改进的近端策略优化算法在多个epoch上更新策略。分配完成后，利用有能力车辆路径模型确定偶发驾驶员的最优路径。一个在线电子杂货平台的案例研究说明了该系统的有效性。实验结果表明，该方法的性能比原方法提高了约17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform

查看原文本刊更多论文

Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform

This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Case Studies on Transport Policy TRANSPORTATION-

CiteScore

5.00

自引率

12.00%

发文量

222