基于时间约束的Actor-Critic强化学习的即时交付并发订单调度

2021 IEEE Real-Time Systems Symposium (RTSS) Pub Date : 2021-12-01 DOI:10.1109/rtss52674.2021.00026

Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He

{"title":"基于时间约束的Actor-Critic强化学习的即时交付并发订单调度","authors":"Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He","doi":"10.1109/rtss52674.2021.00026","DOIUrl":null,"url":null,"abstract":"Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.","PeriodicalId":102789,"journal":{"name":"2021 IEEE Real-Time Systems Symposium (RTSS)","volume":"9 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Concurrent Order Dispatch for Instant Delivery with Time-Constrained Actor-Critic Reinforcement Learning\",\"authors\":\"Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He\",\"doi\":\"10.1109/rtss52674.2021.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.\",\"PeriodicalId\":102789,\"journal\":{\"name\":\"2021 IEEE Real-Time Systems Symposium (RTSS)\",\"volume\":\"9 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Real-Time Systems Symposium (RTSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/rtss52674.2021.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/rtss52674.2021.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

速递由于其时效性和便利性，近年来发展迅速，极大地改变了人们的生活方式。在即时交付中，订单分发过程是并发的。快递员不断地接受新订单，并在一次递送行程中递送多个订单(即一批)。一批订单的交货时间往往是相互重叠和相互联系的。由于时间限制和实时逾期可能性(即未在承诺时间内完成交付的比率)，批中现有订单的取货和交付顺序会动态变化。现有的订单调度机制大多是针对独立订单调度或无严格时间约束的并发交付而设计的，无法处理按需即时交付中具有严格时间约束的实时并发调度。为了解决这一挑战，我们提出了一种基于时间约束的参与者-评论家强化学习的并发调度系统，称为tacc - dispatch，以提高长期总体收入并降低逾期率。具体而言，我们设计了一个具有可变动作空间的深度匹配网络(DMN)，该网络将状态嵌入(包括路由行为编码)和动作嵌入特征集成到一个长期匹配值中。在此基础上，提出了考虑严格时间约束和随机供需的即时交货并行订单调度问题。设计了基于估计时间的动作修剪模块，保证了时间约束的保证，加快了训练和调度过程。我们对taca - dispatch进行了评估，其中一个月的数据涉及3648万份订单和42000名快递员，这些数据来自中国最大的即时快递公司之一，即Eleme。在部署在Eleme开发环境上的数据驱动仿真器上进行了实证实验，结果表明我们的方法实现了总收入增长的22%，并将逾期率降低了21.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Concurrent Order Dispatch for Instant Delivery with Time-Constrained Actor-Critic Reinforcement Learning

Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Real-Time Systems Symposium (RTSS)

自引率

0.00%

发文量