偶尔使用两阶段深度强化学习的司机分配——一个电子杂货平台的案例研究

IF 3.3 Q3 TRANSPORTATION
Nguyen Thi Tam Thanh, Nguyen Van Hop
{"title":"偶尔使用两阶段深度强化学习的司机分配——一个电子杂货平台的案例研究","authors":"Nguyen Thi Tam Thanh,&nbsp;Nguyen Van Hop","doi":"10.1016/j.cstp.2025.101606","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.</div></div>","PeriodicalId":46989,"journal":{"name":"Case Studies on Transport Policy","volume":"22 ","pages":"Article 101606"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform\",\"authors\":\"Nguyen Thi Tam Thanh,&nbsp;Nguyen Van Hop\",\"doi\":\"10.1016/j.cstp.2025.101606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.</div></div>\",\"PeriodicalId\":46989,\"journal\":{\"name\":\"Case Studies on Transport Policy\",\"volume\":\"22 \",\"pages\":\"Article 101606\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Case Studies on Transport Policy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213624X25002433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Case Studies on Transport Policy","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213624X25002433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了一个随机的人群运输问题,考虑了多种客户类型和偶尔司机的不确定可用性。目标是通过有效的客户分配和专用和临时司机的路线来最小化总交付成本。为了有效地处理大规模实例,提出了一种新的两阶段深度强化学习方法。在第一阶段,实现单层前馈神经网络,使用on-policy方法训练和验证随机序列的估计奖励函数。通过使用混合整数规划解决方案采取适当的操作来对客户类型进行分类。然后,在训练过程中,改进的近端策略优化算法在多个epoch上更新策略。分配完成后,利用有能力车辆路径模型确定偶发驾驶员的最优路径。一个在线电子杂货平台的案例研究说明了该系统的有效性。实验结果表明,该方法的性能比原方法提高了约17%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform

Occasional drivers’ allocation using two-stage deep reinforcement learning – a case study of an e-grocery platform
This study addresses a stochastic crowd-shipping problem, considering multiple customer types and the uncertain availability of occasional drivers. The objective is to minimize the total delivery cost through efficient customer allocation and routing for dedicated and occasional drivers. A novel two-phase Deep Reinforcement Learning approach is introduced to efficiently handle large-scale instances. In the first stage, a Single-Layer Feed-Forward Neural Network is implemented to train and validate the estimated reward function of random sequences using an on-policy method. The customer types are classified by taking appropriate actions using a Mixed-Integer Programming solution. Then, a modified Proximal Policy Optimization algorithm updates policies over multiple epochs during training. After allocation, optimal routes for occasional drivers are determined by a Capacitated Vehicle Routing model. A case study of an online e-grocery platform illustrates the efficiency of the proposed system. Experimental results indicate that the proposed approach outperforms the previous approach by approximately 17%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.00
自引率
12.00%
发文量
222
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信